Hi, 2010/11/25 Markus Wanner <mar...@bluegap.ch>: > Eliot, > > On 11/23/2010 09:43 PM, Eliot Gable wrote: >> I know there has been a lot of talk about replication getting built into >> Postgres and I know of many projects that aim to fill the role. However, >> I have not seen much in the way of a serious attempt at multi-master >> write scaling. > > Postgres-XC and Postgres-R are two pretty serious projects, IMO.
Yes. Please visit http://postgres-xc.sourceforge.net/ for details. >> I understand the fundamental problem with write scaling >> across multiple nodes is Disk I/O and inter-node communication latency >> and that in the conventional synchronous, multi-master replication type >> setup you would be limited to the speed of the slowest node, > > That's not necessarily true for Postgres-R, which is why I call it an > 'eager' solution (as opposed to fully synchronous). While it guarantees > that all transactions that got committed *will* be committable on all > nodes at some time in the future, nodes may still lag behind others. > > Thus, even a slower / busy node doesn't hold back the others, but may > serve stale data. Ideally, your load balancer accounts for that and > gives that node a break or at least reduces the amount of transactions > going to that node, so it can catch up again. > > Anyway, that's pretty Postgres-R specific. Right. In the case of Postgres-XC, tables can be partitioned (we call "distributed") among cluster nodes so that writing can be done in parallel. > >> plus the >> communication protocol overhead and latency. However, it occurs to me >> that if you had a shared disk system via either iSCSI, Fiber Channel, >> NFS, or whatever (which also had higher I/O capabilities than a single >> server could utilize), if you used a file system that supported locks on >> a particular section (extent) of a file, it should theoretically be >> possible for multiple Postgres instances on multiple systems sharing the >> database to read and write to the database without causing corruption. > > Possible, yes. Worthwile to do, probably not. We may be suffered from synchronizing cache on each database. > >> Has anyone put any thought into what it would take to do this in >> Postgres? Is it simply a matter of making the database file interaction >> code aware of extent locking, or is it considerably more involved than >> that? It also occurs to me that you probably need some form of >> transaction ordering mechanism across the nodes based on synchronized >> timestamps, but it seems Postgres-R has the required code to do that >> portion already written. > > If you rely on such an ordering, why use additional locks. That seems > like a waste of resources compared to Postgres-R. Not to mention the > introduction of a SPOF with the SAN. > >> Wouldn't this type of setup be far >> simpler to implement > > That's certainly debatable, yes. I obviously think that the benefit per > cost ratio for Postgres-R is better :-) > >> and provide better scalability than trying to do >> multi-master replication using log shipping or binary object shipping or >> any other techniques? Postgres-XC uses combination of replicated table and distributed (partitioned) table, not just simple replication. > > It's more similar to replication using two phase commit, which provably > doesn't scale (see for example [1]) And using a SAN for locking > certainly doesn't beat 2PC via an equally modern/expensive interconnect. > >> Wouldn't it also be far more efficient since you >> don't need to have a copy of your data on each master node and therefor >> also don't have to ship your data to each node and have each node >> process it? > > You have to ship it from the SAN to the node, so I definitely don't > think so, but see this as an argument against it. Each having a local > copy and only exchange locking information and transactional changes > sounds like much less traffic overall. > > Regards > > Markus Wanner > > > [1]: The Dangers of Replication and a Solution, Gray et al, In Proc. of > the SIGMOD Conf., 1996, > http://research.microsoft.com/apps/pubs/default.aspx?id=68247 > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > Cheers; --- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers