On Thu, Jun 11, 2015 at 4:03 PM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > I've been trying to figure out a plan to enable native column stores > (CS or "colstore") for Postgres. Motivations: > > * avoid the 32 TB limit for tables > * avoid the 1600 column limit for tables > * increased performance > And better compression ratio.
> We're not interested in perpetuating the idea that a CS needs to go > through the FDW mechanism. > Agree. It is cleaner to add a ColumnScan node which does a scan against a columnar table, and a possible ColumnIndexScan for an indexed columnar table seek. > Since we want to have pluggable implementations, we need to have a > registry of store implementations. > If we do real native implementation, where columnar store sits on par with heap, can give us arbitray flexibility to control performance and transaction, without worrying about interface (you defined below) compatibility. > One critical detail is what will be used to identify a heap row when > talking to a CS implementation. There are two main possibilities: > > 1. use CTIDs > 2. use some logical tuple identifier > I like the concept of half row, half columnar table: this allows row part good for select * and updates, and columnar part for other purpose. Popular columnar-only table uses position alignment, which is virtual (no storage), to associate each column value. CTIDs are still needed but not for this purpose. An alternaive is: 1. Allow column groups, where several columns physically stored together; 2. Updates are handled by a separate row store table associated with each columnar table. > Query Processing > ---------------- > If we treat columnar storage as first class citizen as heap, we can model after heap, which enables much natural change in parser, rewriter, planner and executor. Regards, Qingqing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers