On Wed, Feb 24, 2016 at 12:17 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote:
> Hi, Bruce! > > The important point for me is to distinguish different kind of plans: > implementation plan and research plan. > If we're talking about implementation plan then it should be proven that > proposed approach works in this case. I.e research should be already done. > If we're talking about research plan then we should realize that result is > unpredictable. And we would probably need to dramatically change our way. > > This two things would work with FDW: > 1) Pull data from data nodes to coordinator. > 2) Pushdown computations from coordinator to data nodes: joins, aggregates > etc. > It's proven and clear. This is good. > Another point is that these FDW advances are useful by themselves. This is > good too. > > However, the model of FDW assumes that communication happen only between > coordinator and data node. But full-weight distributed optimized can't be > done under this restriction, because it requires every node to communicate > every other node if it makes distributed query faster. And as I get, FDW > approach currently have no research and no particular plan for that. > > As I get from Robert Haas's talk ( > https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0 > ) > >> Before we consider repartitioning joins, we should probably get >> everything previously discussed working first. >> – Join Pushdown For Parallelism, FDWs >> – PartialAggregate/FinalizeAggregate >> – Aggregate Pushdown For Parallelism, FDWs >> – Declarative Partitioning >> – Parallel-Aware Append > > > So, as I get we didn't ever think about possibility of data redistribution > using FDW. Probably, something changed since that time. But I haven't heard > about it. > > On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <br...@momjian.us> wrote: > >> Second, as part of this staged implementation, there are several use >> cases that will be shardable at first, and then only later, more complex >> ones. For example, here are some use cases and the technology they >> require: >> >> 1. Cross-node read-only queries on read-only shards using aggregate >> queries, e.g. data warehouse: >> >> This is the simplest to implement as it doesn't require a global >> transaction manager, global snapshot manager, and the number of rows >> returned from the shards is minimal because of the aggregates. >> >> 2. Cross-node read-only queries on read-only shards using non-aggregate >> queries: >> >> This will stress the coordinator to collect and process many returned >> rows, and will show how well the FDW transfer mechanism scales. >> > > FDW would work for queries which fits pull-pushdown model. I see no plan > to make other queries work. > > >> 3. Cross-node read-only queries on read/write shards: >> >> This will require a global snapshot manager to make sure the shards >> return consistent data. >> >> 4. Cross-node read-write queries: >> >> This will require a global snapshot manager and global snapshot manager. >> > > At this point, it unclear why don't you refer work done in the direction > of distributed transaction manager (which is also distributed snapshot > manager in your terminology) > http://www.postgresql.org/message-id/56bb7880.4020...@postgrespro.ru > > >> In 9.6, we will have FDW join and sort pushdown >> (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s >> calability.html >> <http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-scalability.html>). >> Unfortunately I don't think we will have aggregate >> pushdown, so we can't test #1, but we might be able to test #2, even in >> 9.5. Also, we might have better partitioning syntax in 9.6. >> >> We need things like parallel partition access and replicated lookup >> tables for more join pushdown. >> >> In a way, because these enhancements are useful independent of sharding, >> we have not tested to see how well an FDW sharding setup will work and >> for which workloads. >> > > This is the point I agree. I'm not objecting against any single FDW > advance, because it's useful by itself. > > We know Postgres XC/XL works, and scales, but we also know they require >> too many code changes to be merged into Postgres (at least based on >> previous discussions). The FDW sharding approach is to enhance the >> existing features of Postgres to allow as much sharding as possible. >> > > This comparison doesn't seems correct to me. Postgres XC/XL supports data > redistribution between nodes. And I haven't heard any single idea of > supporting this in FDW. You are comparing not equal things. > > >> Once that is done, we can see what workloads it covers and >> decide if we are willing to copy the volume of code necessary >> to implement all supported Postgres XC or XL workloads. >> (The Postgres XL license now matches the Postgres license, >> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/. >> Postgres XC has always used the Postgres license.) >> >> If we are not willing to add code for the missing Postgres XC/XL >> features, Postgres XC/XL will probably remain a separate fork of >> Postgres. I don't think anyone knows the answer to this question, and I >> don't know how to find the answer except to keep going with our current >> FDW sharding approach. >> > > I have nothing against particular FDW advances. However, it's unclear for > me that FDW should be the only sharding approach. > It's unproven that FDW can do work that Postgres XC/XL does. With FDW we > can have some low-hanging fruits. That's good. > But it's unclear we can have high-hanging fruits (like data > redistribution) with FDW approach. And if we can it's unclear that it would > be easier than with other approaches. > Just let's don't call this community chosen plan for implementing sharding. > Until we have full picture we can't select one way and reject others. > I already several times pointed, that we need XTM to be able to continue development in different directions, since there is no clear winner. Moreover, I think there is no fits-all solution and while I agree we need one built-in in the core, other approaches should have ability to exists without patching. > > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company >