RFC: a SQL-representing object

Darren Duncan Fri, 30 May 2003 12:56:22 -0700

Hello (and in particular to database module makers/users),

I am at a point in the design/development of my "Rosetta" database
abstraction tool where I am considering spitting the framework into more
independant pieces than was previously planned, namely splitting up the
"core", so that it is easier for developers to adapt smaller pieces as
they see fit rather than having to take parts they don't want.  So I would
like to request some specific comments/feedback/advice from this list.


Essentially, I would like for my work to be useful to people, but I don't
want to give people the impression that I have a "not invented here"
problem.  At the same time, I don't want to limit my creativity.

P.S.  Please reply both directly to me and to [EMAIL PROTECTED] (I am not
subscribed to the mailing list, both of us should have copies).

The questions are at the end or in the middle so you should read the whole
letter before replying to any of it.

The "core" of Rosetta as it currently exists or is planned are already 4
separate pieces conceptually, but they have been distributed together
because I thought that is what most people do or would expect.  The 4
pieces which I would consider separating into their own tgz
distributions are currently:

1. "Rosetta::*" - A bunch of POD files providing documentation.  These
make up a significant part of the weight of the whole distribution.  They
describe the design of the system and how to use it or write extensions.

2. "Rosetta::Locale::*" - Essentially a bunch of constant data that
is stored as Perl code (an anonymous hash declaration).  It is mainly a
collection of user-readable text strings mapped to short machine-readable
codes; when certain parts of Rosetta want to provide a message for display
to the user, usually an error message, they "throw" the code (with
optional variable values for interpolation), and the matching
user-readable string in the user's locale/language is fetched for display.
The idea is similar to how some existing systems, such as Mac OS X, handle
user-text which is built into the program; the text for each user language
is in a separate resource file that comes with the program, and adding
support for more languages is as simple as adding a file.  Since this is
not user-defined text, it does not make sense to store it in a code table
in a database, or if we did, then those would be populated from the above
Locale files.  Error messages et al are part of the program.

3. "Rosetta::Schema::*" - These classes are intended to do for SQL what a
DOM does for XML, which is to have an object that is a fully-parsed
representation of an instruction string.  Given that different RDBMS
vendors use their own variants of SQL, in many ways the same and in some
ways different, what my classes would do is represent a normalized
superset of the various SQL dialects.  In a manner of speaking, they would
constitute the parsed version of a new SQL dialect that can describe any
task the old dialects could, including some proprietary ones.  The
difference in my case is that my "SQL dialect" is intended to always be an
object and not a string.  That said, it is intended that SQL strings in
any existing dialect could be parsed into objects of my classes, and new
SQL strings in another dialect could be generated.  But I would not be
doing either of those myself in these classes.  Sometimes one SQL
statement in a source dialect may become multiple statements in another,
if one wants to emulate a feature.  I call these "Schema" because the
majority of SQL details are used for creating objects in an RDBMS schema
and invoking them.  "Objects" being a generic term to include tables,
views, stored procedures and functions, and so on.  Given that in practice
database "views" look the same as "select statements", as do "subqueries"
or "cursors".  I am using the same structure to describe both, and
insert/update/delete SQL is just the inverse operation of a select, which
can be generated from a view definition.  Schema objects are not "live";
they do not "do" anything; they simply are containers of program "data".
An advantage of using these objects is that they work with databases whose
native interface is not SQL, just as much as with those that do; also, one
could render these objects into Perl code instead of SQL if they want to
perform the same functionality at the application level, but the calling
application wouldn't have to know, and no parsing of SQL is necessary.

4. "Rosetta::Engine::*" - These classes are thin programmatic interfaces
to a runtime environment of sorts for Schema objects.  They are "live" in
the same way that DBI is "live", and like DBI, they mostly pass the
workload to RDBMS specific "Driver" modules, which either interface to
databases doing the real work or implement some of the work themselves.
Mostly the "Engine" modules just store some context-specific environment
variables and do some error trapping/reporting for the Driver modules.
How they mainly differ from DBI is that they take objects for all their
inputs (usually Schema objects) rather than SQL strings.  In that respect,
they are a higher level of abstraction than DBI.  Unlike most other DBI
abstraction modules, mine do not expose their implementation (underlying
DBI objects) to callers, which would save applications that use my modules
from breaking when the modules I use myself change.

Note that the "Driver" modules (corresponding to DBD modules or the
variety of RDBMS-specific hint modules that several other abstraction
frameworks use) were never part of the "core" and they are already in
other distributions; I will not be discussing those in this email, at
least not much.

Out of the above module groups, the one that I thought would be the best
for "setting free" would be the "Schema" ones, because I thought they
would provide the most value or "new blood" to users or creators of other
CPAN modules.  But at the same time I thought this set, which could in
fact be all put in a single module if that would be easier to use, perhaps
should have a new name.

So, my first questions are these: 1. Would a DOM-for-SQL be useful in its
own right to other module developers, and therefore grow beyond its
previous intention of being "part of just one framework"; 2. What should
this new module be called?

As a further explanation of what said modules do, they can also be
considered a procedural programming language of sorts, especially suited
for databases, which could be rendered at least in part into multiple
programming languages.  That is, the modules describe invokable procedures
or other objects without actually containing any code.  They are intended
to be rendered/compiled eventually into some RDBMS-native format (usually
with SQL being an intermediary format), and executed by the database.  But
they could also be rendered/compiled into Perl code and then eval'd, if
one wanted.  I have no intention of making an "interpreter", which would
be ungodly slow or complicated.

But it isn't all that complicated, really.  And it is feasible.

So another question: 3. Are there already any modules on CPAN which store
a parsed representation of a generic programming language, in such a way
as to form an intermediary format of translating a program from one
procedural language to another, and if so then what are they and where are
they listed?  I haven't been able to find any so far.

Now, all of the modules that I have seen so far which generate or parse
SQL seem to be limited to just the most common tasks, mainly creating tables
and doing some select/insert/update/delete operations.  But all of the
select-generating ones I have seen appear to be limited to either working
with one table or working with just a few that are related in a few
specific ways.  Doing anything more complicated requires writing SQL
fragments yourself, and then some of the magic doesn't work anymore.

A question: 4. Would a comprehensive intermediate SQL object be useful to
writers of a lot of these modules, so they can more quickly support the
parsing or generation of more complex SQL, and improve their works?  The
idea is that my module would provide a structured hierarchy of slots (like
a DOM) into where details pulled from SQL being parsed could go, or from
which details for SQL being generated could come from?  Isn't it true that
a lot of the trouble of parsing/generating is just coming up with places
to keep all the pieces?  Or would these people not be able to use such a
module?

Now, some possible names I have come up with are (unless they are
already in use):

Rosetta::Schema
Rosetta::DOM
Rosetta::Dictionary
SQL::DOM
SQL::Parsed
SQL::Dictionary
Class::SQL
Class::AbstractSQL
Class::ParsedSQL
Class::SQLDOM

Now, some of the issues I need to keep in mind are:
1. My class has nothing to do with XML, although they could be serialized
into XML, so I am wondering whether "DOM" implies XML and therefore I
shouldn't use it.
2. My class is meant to also be usable with databases that don't
understand SQL natively (such as older ones), so would having "SQL" in the
name be a problem.  I suspect it may not.  Also, is "SQL" trademarked?
3. Since Rosetta as a whole depends so much on these modules (despite the
effort to make them usable on their own), would it just be better to leave
them named "Rosetta::*" but still distribute them separately?  What would
be the most descriptive of what they do?  Or could it be said that these
"Schema" modules in fact *are* Rosetta and everything else is an extension
to them?

My apologies to you if these seem like tough or ill-defined questions.
But I thought I should get them out sooner rather than later.

Thank you and have a good day.

P.S.  Please reply both directly to me and to [EMAIL PROTECTED] (I am not
subscribed to the mailing list, both of us should have copies).

RFC: a SQL-representing object

Reply via email to