Re: [GENERAL] CREATE TABLE with REFERENCE

Dmitry Tkach Tue, 29 Jul 2003 12:03:55 -0700

Jonathan Bartlett wrote:

NOTE - after writing all this, I did think of a possible solution, but I'm not sure if PG can handle it. If I made a table called "object" with one column, the object_id, and then had EVERY table inherit from this table. Then, I could have my constraints set up against this master table.

Ummmm... yeah, that would be a great way to do that *if* the "inheritance" emulation in sql worked that way ... Unfortunately, it does not :-(

That's exactly the reason I called it 'half-baked' in one of the earlier messages... When you inherit table B from table A, B will have all the columns A has, but setting up an FK on table C against A won't work, because the FK will check for the key to be present in A *itself*, not in A or any of its inherited children, as one would expect...

What you *could* do, though, if you really wanted is to kinda emulate that inheritance on your own, with something like:

create table A
(
   id serial primary key,
   type text not null
);
create unique index a_idx on A (id,type);

create table B1 ( id int primary key, type text not null default 'b'; stuff text foreign key (id,type) references A(id,type) on delete cascade on update cascade deferrable initially deferred );

create table B2 ( id int primary key, type text not null default 'b1', stuff text, foreign key (id,type) references A(id,type) on delete cascade on update cascade deferrable initially deferred );

create table C ( id int not null references A on delete cascade on update cascade initially deferred, note text );

... now you can insert notes for either B1 or B2 into C

This kinda works, but just seems like too much trouble to go through - it would be nice if the 'inheritance' support could do something like that for you automatically, but, since it doesn't, I'd rather stick with the old good 'plain sql' solution - get rid of A, merge B1 and B2 together (just B), and make C reference B.

If you can combine your queries with a union, your table layouts must be very similar if not identical. Why not put everything into the same table then, and just have an FK between that table and the notes?
No, there are _parts_ that are very similar.  I don't know where this
whole "table-combining" kick came from, but I've usually found that it
ends in a disaster.

Then you need to normalize your schema first - extract those "very similar" parts, and put them into the single table, and make your notes table reference that one, then create specialized table(s) on the side, that will contain those columns that are different between the objects, and make them reference your 'master' table too.

What do you mean by soft-delete? Leaving orphaned notes behind? Bad idea...

I have a boolean flag that says "active". I don't ever actually purge data. There are times when it is useful to come back in and look at what's been "deleted". From the user-interrface standpoint it has been deleted, but we can still go back in and retrieve records after they are gone.

Well... That's the 'GUI delete'... Sooner or later you will want to do the 'real' delete - either to remove an object that just was created by mistake, and should not be there at all, or simply to clean up your database, and remove the stuff that has been sitting there for years, being 'inactive'

How can you be sure? What if you get hit by a bus, and another admin, who is not as knowledgeable as you are goes to the db, and runs an insert with plain sql?

They should read the documentation.

I am sure, they will... *after* they screw up the database, and begin wonderring what's wrong with it :-)

If they want to run an insert with
SQL, they should at least be smart about it :)  If someone's too stupid to
read documentation, they are going to screw up the database no matter
what.

Not really... If your database schema is thoughtfully designed, one has to be *really* smart to be able to screw something up.

Also, even if it was indeed only possible because of a broken code, you are not saying that your code is bug-free, are you?
If it's not bug-free, having a good database schema isn't going to save
me.

No, it is not going to save *you*, but it *is* going to save your *data*

Even, if it was, postgres is not, and your filesystem and OS are not either. If the database crashes in the middle of your insert, you'll end up having inconsistent data.

Transactions will handle that one.

No, they won't, unless you actually use them :-)

If I understand your statement ('what would the harm be') correctly, and you just don't care about your data consistency, then, I guess, you are rigfht - you don't need any constraints... but, in that case, I don't

You are missing the point. There are many things that must be balanced:

* Ease of programming / speed of development

* Correctness of code

* Consistency of data
I can get much further with my system on all three points than I can with
yours.  With mine, I have a single, tested function that I can use
anywhere.  This hits off the first two.  While this prevents me from
having a database-checked #3, it still gives me consistent data because we
don't do hard-deletes and we have tested and verified #2.  If you can show
me how to get te ease-of-programming and correctness with your approach,
I'd be happy to use it.

Easy - merge your tables together the way I described above. Your ease of programming will not be affected - your obj_html() function will still work, and won't even require any changes. *Moreover* - if you ever need to add more object types (or modify the existing ones), you will be able to do that with much less effort, then you would need to invest now, with your current schema (in most of the cases, you'll be able to do those kinds of enhancements without even touching your schema *at all*).


I do care about data consistency and concurrent access.  And my data is
consistent.  I use many of the RDBMS features of postgres such as views /
subselects, transactions, triggers, etc.  The fact that I have instances
where they fall short and I have to do it in the application just shows
that our tools are incomplete, not that I don't care about data

No, it does not show that they are incomplete, it shows that you are not using them the right way :-)

consistency.  In fact, I have trouble thinking how anyone could have
sensical data with everything joined together into one uber-table, which
you seem to be advocating.

Not *one* table. I never advocated that. It is perfectly normal to split your data into different tables *vertically* (i.e. things that do not have any intersection between their data, should go into different tables), but it very rarely (if at all) makes any sense to split it *horizontally* (so that identical columns sit in different tables, just because your application interprets them differently) - the 'early' indication of the problems caused by the latter approach is the temptation to create 'multitable references' - this can always easily be avoided by eliminating those 'horizontal divisions', and doing so will necessarily (and immediately) benefit *all three* of your "balancing points", that you mentioned earlier.

We ahve a standard creation process. If someone is an idiot, that can't be helped.

Not 'helped' entirely, but the amount of damage *can* and *should* be limited. It is an unfortunate reality of life - not all people are smart (even, some of the ones with access to the database are not). It is definitely a bad programming practice to assume otherwise.

What if someone pulls out several drives from a running RAID?

What if someone inserts bad records (even if they are inconsistent)? What if someone steals the server?

The first oen and the last one are hardware related. There are ways to deal with those too (a good start will be putting a lock on the server room for example), but they are beyond the current topic.

As for inserting bad records - that's *exactly* what properly designed constraints are supposed to prevent.

If you wind up with problems, you have to fix them. The fact that problems are possible does not make them likely. There is much more likelihood of a programmer screwing up writing their 14th copy of note_html and the associated table structure than there is that someone pulls a number out of their butt.

Sure. There should not be copies. Code duplication is a software-world analog of poorly designed schema problem in the database world. Both are bad, and should be avoided. I never suggested that you eliminate the latter at the expense of the former. Your note_html() looks great to me, and I agree, that it should be kept that way... it's the database side that look s problematic.

Where would they get the number if not
from the sequence?  "Oooh, I know, I'll insert 10 as the primary key, jsut
for kicks!  I'll even hardcode that into the app!"  Not a very likely
scenario.

Your programmers must be really smart :-) Are you saying that you have never seen a person writing a piece of sql like: insert into mytable (id, data) select max(id) + 1 from mytable, 'mydata' ???

If so, you must be really lucky :-)

What if you load your database from a backup and forget to reinit the sequence?

If it's being loaded from backup, the schema reinits the sequence on schema-load.

Exactly. But 'copy from ' does *not* - so, after you have loaded, your sequnce next_val () will return 1.

If that was the case, you would not be able to combine them with a union, as you said you do...
No, I said that there were certain instances that this happens.  Not that
it is the norm.  In the cases where I do combine them with a union, I am
only combining the similar parts, not the whole shebang.

And that's what you should do - similar (common) parts go to the same table, to which your notes are linked, the differences go to other tables, and get linked to your master table the same way. You can then create a bunch of views to make it *look* exactly the same way as it is now to the programmer, so that your application code will not be affected at all, but your schema will be nicely normalized, and you'll be able to set up constraints on it properly.

Splitting the notes table would be pointless. Why do it?

Because that would make it possible to use the constraints.

Maybe the constraint system should be extended to allow it to check across multiple tables.

Maybe :-) That was my last suggestion in the original message - you can always write a trigger to do that... It would be the worst of the three other possibilities we've considered, but still better, then what you have now.

The OID type/column is ideal for this.

No, it isn't. Not all tables have oids. The ones that do, do not guarantee, that they will be unique. Even if you do that by hand (create unique index on table(oid)), there is still no way to guarantee their uniqueness across entire database.

Sadly, it is only 32 bits, and they are not storing an OID/table lookup like I think they should. That would enable really powerful database applications that are way too much work to make today.

I don't know what you are talking about :-) 'too much work' ... come on. If you think of a solution, that is too much work, it does not mean, that it is the *only* possibility - just think again :-)

With that kind of thing, you could
even do record merges with automatic database support.

What do you mean by "record merges"? Any meaning of that phrase I can imagine can be easily done with the currently supported database features... so, you must mean something different by that, I assume...

Also, if one adopts your earlier point, it can also be argued, that it is equally 'nuts' to have notes about Payments stored together with notes about Sponsors. Those notes have just as much to do with each other as the objects they annotate. :-)

But they are all "notes" on "objects".

Sure... and all of your payments, sponsors, and whatever else you have are 'objects', arent' they? I did not say that your notes have nothing to do with each other - I only said that they have only as much to do with each other as your different kinds of objects do. If putting all the objects (or at least, some parts of them) into the same table doesn't make sense to you, I don't see why putting the notes together should. And the other way around - if treating notes in a generic way does make sense to you, I don't see any reason why doing the same for the 'objects' should not.

If you insist that Payments must be separate from Sponsors, the same exact argument should be applied to their respective notes
Why?  All of the note's attributes function exactly the same, while none
of Payments and Sponsors function the same.  Again, you have yet to show
how you can make a mechanism that is as easily extensible as mine is.

I did - just put the damn objects into the same table :-) It is not just "as easily" extensible as yours, but *much easier* extensible rather

 The
fact that database constraints aren't available to support it is a fault
of the database, not mine.

Your "fault" is that they *are* available, but you just refuse to use them :-)

Hmmm... changing table names, having to go in and recreate an exact copy
of the schema every time that is added.

I don't know what you are talking about... What table names? What copy if schema???? You lost me completely. I *never* suggested anything remotely like that.

You want that same one line of code, working against the properly designed and normalized sql schema, that lets you rely on the database top ensure your data consistency and access efficiency.

Yes, but to set it up requires a full database change.

Yep. It does. No argument here. Frankly, I am not trying to convience you that you should do that database change now (although, if I were you, I would *certainly* do it right away, before I get into any more trouble - changing - what 5 - 10 tables? - doesn't look like *that* much of a deal to me, compared to the benefit you'd by from it). The discussion was about whether or not your schema was *initially* created in a good way.

Whether or not to change it now is another question - as I said, I certainly would, but this is your decision entirely, I am not going to try to talk you into it :-)

Not at all. If you give up your idea about splitting your payments from your sponsors for example, you won't need to modify your schema *at all* if you need to add another object type, or another kind of note, or whatever - all you'd need to do would be to implement the new application logic in your application, where it belongs, and be done with it. No need to even touch your schema at all.
I don't see why you want to make my entire database into one giant table.
Why bother with tables at all in that case?

Not one table - see above. And I don't understand your second question - "why bother with tables"... what's the alternative?


First of all, inheritance wouldn't work in case of the notes, as it is a
one-to-many relationship.

That doesn't matter (it indeed won't work 'out of the box' - see the beginning of this email - but for totally different reason)

No, it's not. At least with PostgreSQL, you can only have ONE line of inheritance (at least as far as I am aware - please correct me if I'm wrong). In that case, all of these would have to have a base class of "notes".

No. They would have a base class of "Object" (or whatever), and the 'notes' would be linked to the Object. This would in fact, be a *beatiful* solution... it's a shame really that it doesn't work. I am wonderring if what postgres does with those inherited FK constraints is specified by the standard, or if it is just an implementation feature, that can be improved...

Of course, if I wanted to add other similar features, I would be
unable to.  For example, if I wanted to add a "changelog" table, and be
able to attach last_modified, creator, etc to certain tables, I could not
do so without adding them to all of them.

No, you would be perfectly able to add "features" to base class as much as you want. That's the beauty of the inheritance. The "one-to-one" features can just be added as columns to the base table (you can actually do that with even the current half-baked inheritance implementation, you can even do that without any inheritance at all - if all of your common features sit in the same table, like I suggest, you can just keep adding columns to that table as you like). The one-to-many ones can go to additional tables, and get linked to the object id the same way your notes do.


The approach I have let's me pick and choose any additional feature I want
to add to any record.

... sure. Except if you wanted to add a new feature to *all* of the objects... Then you would either have to use your "notes hack", and create a separate table with broken links, or you would have to go and modify each and every of your "object" tables separately. That's another reason for you to extract the common features into a "supertable" - a change like this would then become a breeze.

Again, this would require modifying and testing that trigger every time I
want to add a new thing to take notes on.
Sure, if you implement in such way. But not if you give it some thought in advance, and come up with an implementation that would be generic enough not to care about your application-specific differences between sponsors and payments :-)
Actually, they are database-specific.

No, they are not :-) The database doesn't have any notion of payments, sponsors, notes etc. All this stuff is application specific. Database specific are tables, columns and constraints. Everything else is application logic, that belongs to the application.

Dima


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] CREATE TABLE with REFERENCE

Reply via email to