On Thu, 2002-04-04 at 14:17, Oleg Bartunov wrote: Subject: Bidirectional hard joins > > Hi, > > > Here we propose some essential improvement of postgreSQL functionality, > which may provide a great perfomance increase. > > 1. Problem > > The fastest way to find and fetch a record from a table is to > perform a SELECT ... WHERE record.id = value. > Probably, an index scan would be performed for this SELECT. > > Such index scan seems to be fast, but there are some cases where it may > appear too slow. The most evident case is the case of a sub-query, which > can arise as a result of a join or a nested select statement. > > If it were possible to store direct references to database > records in the tables, joins could be implemented in much more effective > way. > > 2. Possible soultion > > Creating a datatype which stores direct reference > to the record (i.e., physical location of the tuple) is only a part of the > solution.
The tid type does exaclty what is needed. > > When a record that is referenced is updated its physical location can be > changed, so the references to it should be updated. To make this > possible, the referenced record should remember all the references to > itself. Thus, we consider the direct tuple references as bidirectional > links, or "bidirectional hard joins". > > These "hard joins" are in some sense similar to hard links in a > filesystem (in this analogy, classic joins are like symbolic links). > > Philosophically, this means a convergence between indexes and tables: a > table behaives like an index for an other table. > > Obviously, this is is a nonrelational feature, and it requires some > special SQL syntax. Below we provide some examples for clarification of > the use of the proposed feature. Or we can just use tid's and ordinary joins to make it a relational feature. IIRC this has been discussed on this list a few months ago. I'm not sure if bi-directional tid usage was discussed, but I can't see how to use them efficiently in a non-overwrite storage manager. ... > 4. Performance > > When direct joins are used in select statements, they can strongly > increase performance. > > Let us examine the query plan of the request ("Find all Bob's > children") from the example above in the present day postgres. > create table man (id SERIAL,name text); > create table child (id SERIAL,name text, parent_id int4 references man(id)); > .. populate the tables ... and create indexes... > explain select child.name from child, man > where child.parent_id = man.id > and man.name = 'Bob Scott'; > > Nested Loop > -> Index Scan using man_n on man > -> Index Scan using child_par on child > > > > In a hypotetical postgres with hard joins it could be: > > Nested Loop > -> Index Scan using man_n on man > -> Direct retrieval on child > > I.e., the for each retrieved "man" record we retrieve the "child" records > directly using hard join. The real overhead for this operation should be > neglible in comparison with index scan. OTOH, if index is in memory and the retrieved tuple is not then the _speed_difference_ could be neglible. > Using the hard joins require some additional overhead in updates. In fact, > after updating the record which takes part in such join, the references > to this record in the other records should be also updated. And this should be in a non-overwriting way. If we just do a standard UPDATE, causing a new heap record to be added this will result in a circle as then the original records references are not valid anymore and so also need to be updated and so on ... > This operation > is not essentially new for postgres as similar things are done with > indexes when an indexed record is updated. Hence, the overhead for updates > is not greater than the overhead for updating indexes. AFAIK indexes are not "updated" but a new index entry is added as the old tuple may be still visible to some other transaction. > 5. Implementation and conclusion > > Effective implementing of hard joins requires hard changes to postgres, > most serious of them probably in the executor, where a new method "fetch > record by reference" should be added in addition to "index scan" and "seq > scan". Also the optimizer should be taught to deal with this. > > The update support is not so hard as it is similar to the updating of > indexes. > > Though the implementation of such hard joins is really a complicated task, > the performance it brings should be tremendous, so we consider discussing > this important. Depending on usage the performance degradation can also be tremendous, as a simple update can trigger an avalance of referencing tid updates ... -------------- Hannu ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])