On Thu, May 31, 2018 at 11:00 PM, MauMau <maumau...@gmail.com> wrote: > 2018-05-31 22:44 GMT+09:00, Robert Haas <robertmh...@gmail.com>: >> On Thu, May 31, 2018 at 8:12 AM, MauMau <maumau...@gmail.com> wrote: >>> Oh, I didn't know you support FDW approach mainly for analytics. I >>> guessed the first target was OLTP read-write scalability. >> >> That seems like a harder target to me, because you will have an extra >> hop involved -- SQL from the client to the first server, then via SQL >> to a second server. The work of parsing and planning also has to be >> done twice, once for the foreign table and again for the table. For >> longer-running queries this overhead doesn't matter as much, but for >> short-running queries it is significant. > > Yes, that extra hop and double parsing/planning were the killer for > our performance goal when we tried to meet our customer's scaleout > needs with XL. The application executes 82 DML statements in one > transaction. Those DMLs consist of INSERT, UPDATE and SELECT that > only accesses one row with a primary key. The target tables are only > a few, so the application PREPAREs a few statements and EXECUTEs them > repeatedly. We placed the coordinator node of XL on the same host as > the application, and data nodes and GTM on other individual nodes. >
I agree that there's double parsing happening, but I am hesitant to agree with the double planning claim. We do plan, let's say a join between two foreign tables, on the local server, but that's only to decide whether it's efficient to join locally or on the foreign server. That means we create foreign paths for scan on the foreign tables, may be as many parameterized plans as the number of join conditions, and one path for the join pushdown that's it. We then create local join paths but we need those to decide whether it's efficient to join locally and if yes, which way. But don't create paths as to how the foreign server would plan that join. That's not double planning since we do not create same paths locally and on the foreign server. In order to avoid double parsing, we might want to find a way to pass a "normalized" parse tree down to the foreign server. We need to normalize the OIDs in the parse tree since those may be different across the nodes. > > >> I don't know what "node management" and "failure dectection/failover" >> mean specifically. I'd like to hear proposals, though. > > That's nothing special or new. Things like: That's a good summary of what we need here. Thanks for the summary. > > * Define a set of nodes that can join the cluster. > * Initialize or configure a node according to its role in the cluster. > * Decommission a node from the cluster. > * Define a node group in which all member nodes have the same data set > for redundancy. > * One command to start and shutdown the entire cluster. Right. > * System tables to display the member nodes and node groups. I think we need system tables on each node to store the cluster configuration as seen by that node not just display. But that's a bit of a detail. > * Each node's in-memory view of the current cluster state. > * How each node monitors which other nodes. That's where we can have multiple ways. It's either each node monitoring other nodes or we have a kind of watch-dog or a central place (not necessarily a node, a file shared across the nodes might as well server that purpose) from where we can fetch the "last known" state of the cluster. But that along with split brain problem is much larger problem to solve. I agree we need some way to know the "last known" status of the cluster and then correct it as the cluster has conflicting experiences, and do all of that without having much communication overhead. > * Elect a new primary node within a node group when the current > primary node fails. Well, some configuration might want more than one primary or coordinator nodes. Having a single primary in itself creates an SPOF, which should be avoided. But then there will be some loads which will be happy with a single primary and risks arising out of that. > * Whether each node group should be configured with a master-slaves > replication topology, or a multi-master topology like MySQL Group > Replication Instead of a master-slave configuration, we might want to use logical replication or some such method to create replicas of tables on multiple nodes in a cluster and let the optimizer take advantage of that for join push-down or load balancing. > > Some of the above may end up with XL's things like > pgxc_node/pgxc_group system tables, pgxc_ctl command, CREATE/DROP > NODE/NODE GROUP commands, etc. > I agree. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company