On Tue, 05 Oct 2010 14:59:27 -0400 Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: > > I wonder if we > > should focus on first efforts on really simple cases like CSV files > > (as you mentioned) and perhaps something like memcached, which has > > different properties than a CSV file, but extremely simple ones. I > > think it's inevitable that the API is going to get more complicated > > from release to release and probably not in backward-compatible ways; > > I think it's too early to be worried about that. > > +1 ... we can *not* assume that we know enough to get the API right > the first time. It'll have to be stated up front that it's subject to > change and you probably shouldn't be trying to develop FDWs separately > from the main project for awhile yet. > > I think that looking only at CSV might be too simple, though. It'd be > good to be at least trying to build a remote-PG adapter, even with the > knowledge that it's all throwaway code. Thanks for comments, and sorry for delayed reply.
I propose revised version of FDW API which supports both of CSV and PG. It would be a minimal API to support SQL/MED, so some features such as optimizing query or using remote index are not considered much. ====================== This FDW API is designed to support scanning a foreign table. As SQL/MED standard says, updating external data is not supported. The instance of a foreign table may be a CSV file or a table on an external PostgreSQL server, or anything which could generate PG-style tuples. Scanning of a foreign table will be done in some steps below. Note: FdwRoutine->XXX() means FDW API function. 1. Parser parses a query string and generate a query tree. In this step, a foreign table is not distinguished from a local table. 2. Optimizer generates a Path pathnode (pathtype = T_ForeignPath) for each foreign table in the query. Because foreign tables don't have TID nor index (at least now), we don't add new Path type. In this step, optimizer calls FdwRoutine->GetStatistics() to get estimated costs of the foreign scan. FDW can calculate costs by itself with some statistics, or get costs from remote-side, or leave the costs as is. 3. Planner generates a ForeignScan plannode for each foreign Path. ForeignScan could be typedef of Scan because it has no other member, but it have been defined as a independent structure to make it be able to add other members in the future. 4. Executor executes each ForeignScan node in some steps. Through the Step-4, FDW can keep their own information about the foreign scan in the ForeignScan->FdwReply. 4-a. To initialize a ForeignScan plannodes, ExecInitForeignScan() is called for each ForeignScan node. In ExecInitForeignScan(), ForeignScanState will be created from: * ScanState which hold same information as SeqScan. * FdwRoutine pointer as cache (for cache). * some catalog information about the foreign table (for cache). * connection established with FdwRoutine->ConnectServer(). FDW can use GetFSConnectionByName(name) to get pooled connection by name in FdwRoutine->ConnectServer(). Then, ExecInitForeignScan() calls FdwRouteine->Open() to tell FDW that it's time to start the query. FDW can do some initialization if necessary. 4-b. To retrieve a tuple from the foreign table, ExecForeignScan() calls FdwRoutine->Iterate(). FDW should set next tuple into ss_ScanTupleSlot of ScanState or empty the slot to indicate EOF. Projection and qual evaluation will be done in ExecScan() later, so ExecForeignScan() should return all columns/tuples in the external data store. 4-c. To reset the scan and rewind cursor to the head of the foreign table, ExecForeignReScan() calls FdwRoutine->ReOpen(). This occurs when a ForeignScan node is a inner node of a nested loop join. FDW is required to return the first tuple again at next FdwRoutine->Iterate() call. 4-d. At the end of execution of ForeignScan, ExecEndForeignScan() is called. ExecEndForeignScan() calls FdwRoutine->Close() to tell FDW that no more Iterate will called. FDW can do some finalization if necessary. 5. Connections which have established via FdwRoutine->ConnectServer() are pooled in the backend for future query which accesses same foreign server. Pooling mechanism is implemented in core module, not in the each FDW. When a user executes DISCARD ALL command, or backend dies, FdwRoutine->FreeFSConnection() is called for each foreign connection to discard it. To achieve features above, I propose following FDW API: /* * Almost same as SeqScan, but defined as a structure to allow add * members in the future. */ typedef struct ForeignScan { Scan scan; } ForeignScan; /* * Handle to access FDW-depend data. * Each FDW can use ForeignScanState->reply with casting between FdwReply * and actual type. */ typedef FdwReply FdwReply; /* * ForeignScanState node is used to store scan status. */ typedef struct ForeignScanState { ScanState ss; /* its first field is NodeTag */ FdwRoutine *routine; /* set of FDW routines */ ForeignDataWrapper *wrapper;/* foreign data wrapper */ ForeignServer *server; /* foreign server */ FSConnection *conn; /* connection to the foreign server */ UserMapping *user; /* user mapping */ ForeignTable *table; /* foreign table */ FdwReply *reply; /* private data for each data wrapper */ } ForeignScanState; typedef struct FdwRoutine { /* * Connect to the foreign server identified by server and user. */ FSConnection* (*ConnectServer)(ForeignServer *server, UserMapping *user); /* * Disconnect from the foreign server and free FSConnection object. */ void (*FreeFSConnection)(FSConnection *conn); /* * Estimate costs of a foreign path. FDW should update startup_cost * and total_cost of the Path. * To estimate cost, PG FDW might generate "EXPLAIN" SQL and * execute it on the remote side, or collect statistics in * somewhere and calculate from them. * This parameters are same as cost_foreignscan() in * optimizer/path/costsize.c (the function is just a proxy). */ void (*GetStatistics)(Path *path, PlannerInfo *root, RelOptInfo *baserel); /* * Prepare to return tuples. */ void (*Open)(ForeignScanState *scanstate); /* * Fetch the next tuple and fill tupleslot with it, or clear the slot * to indicate EOF. */ void (*Iterate)(ForeignScanState *scanstate); /* * End the foreign scan and do some cleanup if necessary. */ void (*Close)(ForeignScanState *scanstate); /* * Re-initialize the foreign scan, used when the INNER executor node is * executed again. */ void (*ReOpen)(ForeignScanState *scanstate); } FdwRoutine; ====================== Issues from ideas above: == Connection Pooling == Connect on each foreign scan is obviously useless, but it's open to argument that who should manage the pooled connections, because for CSV wrapper, no "connection" is needed at all, but PG wrapper might connect once per database per user. Maybe DBAs want to see how much and what kind of connections are established now, so we should show list of active connections via a view or function. To treat connections transparently, I propose that connection pooling mechanism in the core module, not in each FDW module. Or should we add API to return list of active connections in common format? If we have decided to leave connection management to each FDWs, ConnectServer() and FreeFSConnection() can be removed from API. == ANALYZE support == Even if a FDW wants to keep statistics in local area such as pg_class.reltuples, current API doesn't provide appropriate timing. Should we add API which is called from ANALYZE to allow FDWs to handle statistics when user wants. Any comments are welcome. Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers