> Here is a new replication documentation section I want to add for 8.2: > > ftp://momjian.us/pub/postgresql/mypatches/replication >
...Read the document, as promissed... First paragraph, "(fail over)" is inconsistent with title, "failover", as are other spots throughout the document. The whole document should be consistent and I vote for "failover" and not "fail over." Fourth paragraph, "This "sync problem" is the fundamental difficulty for servers working together"; "Sync problem" hasn't been defined. Actually, you're talking about the consistent attribute of the "acid" properties of all competent databases: Atomic, Consistency, Isolation, and Durability. At least define the term you are using - probably most easily done in the preceeding paragraph. The fifth paragraph needs a lot more help, I think. Howabout this alternative: So called "two phaised commit" was developed as a strategy in which two or more databases are updated simultaneously and none of the data is committed until all are committed. This guarantees consistency between the databases with all propagation delay being absorbed by the writer at write time. There are times when this propagation delay is large, so sometimes alternatives are worked out which we'll call here "asynchronous updates," however, in these cases, there is always a window of time in which some transaction can be lost should a failure occurr. For this reason, asynchronous updates are only used when the possibility of such losses is acceptible. Paragraphs six through to "shared disk failover" seem very awkward to me. I don't like them at all. "Shared disk failover" has nothing to do with "the sync problem" as it's not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue. Further, it also has nothing to do with disk arrays, though it is often used with RAID to help avoid disk based corruption problems. The point about Warm Standby needs to include a warning about WAL that it MUST be sensitive to the semantics of the database design or else it's fatally flawed. I'm talking about "referential integrety". That is to say, it's inappropriate to capture updates on a table by table basis, as some such systems do, (I have no idea what's done by anyone in the PG world on this right now) because an update to one table (esp. inserts) very often go hand in glove with updates in other tables and to get one without the other can corrupt a database. The description of "Continuously running replication server" should include the critical caveat - repeated if you think it's already said elsewhere - that it is ONLY suitable for applications in which a loss of (missing) update data doesn't matter. For example, an airline reservation system would be an inappropriate application for such a "solution" because what seats are available cannot be guaranteed to be correct. Regarding data partitioning, I strongly disagree with the opening sentence in that it doesn't split a database into sets, it splits tables into sets. Data partitioning is often done within a single database on a single server and therefore, as a concept, has nothing whatsoever to do with different servers. Similarly, the second paragraph of this section is problematic. Please define your term first, then talk about some implementations - this is muddying the water. Further, there are both vertical and horizontal partitioning - you mention neither - and each has its own distinct uses. If partitioning is mentioned, it should be more complete. Next, Query Broadcast Load Balancing... also needs a lot of work. First, it's foremost in my memory that sending read queries everywhere and returning the first result set back is a key way to improve application performance at the cost of additional load on other systems - I guess that's not at all what the document is after here, but it's a worthy part of a dialogue on broadcasting queries. In other words, this has more parts to it than just what the document now entertains. Secondly, the document doesn't address _at_all_ whether this is a two-phaise-commit environment or not. If not, how are updates managed? If each server operates independently and one of them fails, what do you do then? How do you know _any_ server got an insert/update? ... Each server _can't_ operate independently unless the application does its own insert/update commits to every one of them - and that can't be fast, nor does it load balance, though it may contribute to superior uptime performance by the application. Next up; I'm not aware of any current products or projects that provide parallel query execution, though Informix might - I can ask a colleague or two. Either way, it's probably best to simply define the term (perhaps in a little more detail), and not mention solutions - they change with time anyway. While I've never used Oracle's clustering tools, I've read up on them and have customers who use them, and I think this description of Oracle clustering is a mis-read on what the Oracle system actually does. A check with a true Oracle clustering expert is in order here. Hope this helps. If asked, I'm willing to (re)write some of the bits discussed above. Regards, Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 [EMAIL PROTECTED], http://ScienceTools.com/ ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq