* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote: > Yes, the part-1 patch provides a set of interface portion to interact > between the backend code and extension code. Rest of part-2 and part-3 > portions are contrib modules that implements its feature on top of > custom-scan API.
Just to come back to this- the other two "contrib module" patches, at least as I read over their initial submission, were *also* patching portions of backend code which it was apparently discovered that they needed. That's a good bit of my complaint regarding this approach. > FDW's join pushing down is one of the valuable use-cases of this interface, > but not all. As you might know, my motivation is to implement GPU acceleration > feature on top of this interface, that offers alternative way to scan or join > relations or potentially sort or aggregate. If you're looking to just use GPU acceleration for improving individual queries, I would think that Robert's work around backend workers would be a more appropriate way to go, with the ability to move a working set of data from shared buffers and on-disk representation of a relation over to the GPU's memory, perform the operation, and then copy the results back. If that's not possible or effective wrt performance, then I think we need to look at managing the external GPU memory as a foreign system through an FDW which happens to be updated through triggers or similar. The same could potentially be done for memcached systems, etc. "regular" PG tables, just to point out one issue, can be locked on a row-by-row basis, and we know exactly where in shared buffers to go hunt down the rows. How is that going to work here, if this is both a "regular" table and stored off in a GPU's memory across subsequent queries or even transactions? > Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw > side, it might be an idea to have common code (like a logic to check whether > the both relations to be joined belongs to same foreign server) on the backend > side as something like a gateway of them. Yes, that's what I was suggesting above- we should be asking the FDWs on a case-by-case basis how to cost out the join between foreign tables which they are responsible for. Asking two different FDWs servers to cost out a join between their tables doesn't make any sense to me. > As an aside, what should be the scope of FDW interface? > In my understanding, it allows extension to implement "something" on behalf of > a particular data structure being declared with CREATE FOREIGN TABLE. That's where it is today, but certainly not our end goal. > In other words, extension's responsibility is to generate a view of > "something" > according to PostgreSQL' internal data structure, instead of the object > itself. The result of the FDW call needs to be something which PG understands and can work with, otherwise we wouldn't be able to, say, run PL/pgsql code on the result, or pass it into some other aggregate which we decided was cheaper to run locally. Being able to push down aggregates to the remote side of an FDW certainly fits in quite well with that. > On the other hands, custom-scan interface allows extensions to implement > alternative methods to scan or join particular relations, but it is not a role > to perform as a target being referenced in queries. In other words, it is > methods > to access objects. The custom-scan interface still needs to produce "something" according to PG's internal data structures, so it's not clear to me where you're going with this. > It is natural both features are similar because both of them intends > extensions > to hook the planner and executor, however, its purpose is different. I disagree as I don't really view FDWs as "hooks". A "hook" is more like a trigger- sure, you can modify the data in transit, or throw an error if you see an issue, but you don't get to redefine the world and throw out what the planner or optimizer knows about the rest of what is going on in the query. Thanks, Stephen
signature.asc
Description: Digital signature