> > > If you're looking to just use GPU acceleration for improving > > > individual queries, I would think that Robert's work around backend > > > workers would be a more appropriate way to go, with the ability to > > > move a working set of data from shared buffers and on-disk > > > representation of a relation over to the GPU's memory, perform the > operation, and then copy the results back. > > > > > The approach is similar to the Robert's work except for GPU adoption, > > instead of multicore CPUs. So, I tried to review his work to apply the > > facilities on my extension also. > > Good, I'd be very curious to hear how that might solve the issue for you, > instead of using hte CustomScan approach.. > I (plan to) use custom-scan of course. Once a relation is referenced and optimizer decided GPU acceleration is cheaper, associated custom- scan node read the data from underlying relation (or in-memory cache if exists) then move to the shared memory buffer to deliver GPU management background worker that launches asynchronous DMA one by one. After that, custom-scan node receives filtered records via shared- memory buffer, so it can construct tuples to be returned to the upper node.
> > > "regular" PG tables, just to point out one issue, can be locked on a > > > row-by-row basis, and we know exactly where in shared buffers to go > > > hunt down the rows. How is that going to work here, if this is both > a "regular" > > > table and stored off in a GPU's memory across subsequent queries or > > > even transactions? > > > > > It shall be handled "case-by-case" basis, I think. If row-level lock > > is required over the table scan, custom-scan node shall return a tuple > > being located on the shared buffer, instead of the cached tuples. Of > > course, it is an option for custom-scan node to calculate qualifiers > > by GPU with cached data and returns tuples identified by ctid of the cached > tuples. > > Anyway, it is not a significant problem. > > I think you're being a bit too hand-wavey here, but if we're talking about > pre-scanning the data using PG before sending it to the GPU and then only > performing a single statement on the GPU, we should be able to deal with > it. It's what I want to implement. > I'm worried about your ideas to try and cache things on the GPU though, > if you're not prepared to deal with locks happening in shared memory on > the rows you've got cached out on the GPU, or hint bits, or the visibility > map being updated, etc... > It does not remain any state/information on the GPU side. Things related to PG internal stuff is job of CPU. > > OK, I'll move the portion that will be needed commonly for other FDWs > > into the backend code. > > Alright- but realize that there may be objections there on the basis that > the code/structures which you're exposing aren't, and will not be, stable. > I'll have to go back and look at them myself, certainly, and their history. > I see, but it is a process during code getting merged. > > Yes. According to the previous discussion around postgres_fdw getting > > merged, all we can trust on the remote side are built-in data types, > > functions, operators or other stuffs only. > > Well, we're going to need to expand that a bit for aggregates, I'm afraid, > but we should be able to define the API for those aggregates very tightly > based on what PG does today and require that any FDW purporting to provides > those aggregates do it the way PG does. Note that this doesn't solve all > the problems- we've got other issues with regard to pushing aggregates down > into FDWs that need to be solved. > I see. It probably needs more detailed investigation. > > The custom-scan node is intended to perform on regular relations, not > > only foreign tables. It means a special feature (like GPU > > acceleration) can perform transparently for most of existing > > applications. Usually, it defines regular tables for their work on > > installation, not foreign tables. It is the biggest concern for me. > > The line between a foreign table and a local one is becoming blurred already, > but still, if this is the goal then I really think the background worker > is where you should be focused, not on this Custom Scan API. Consider that, > once we've got proper background workers, we're going to need new nodes > which operate in parallel (or some other rejiggering of the nodes- I don't > pretend to know exactly what Robert is thinking here, and I've apparently > forgotten it if he's posted it > somewhere) and those interfaces may drive changes which would impact the > Custom Scan API- or worse, make us deprecate or regret having added it > because now we'll need to break backwards compatibility to add in the > parallel node capability to satisfy the more general non-GPU case. > The custom-scan API is thin abstraction towards the plan node interface, not tightly convinced with a particular use case, like GPU, remote-join and so on. So, I'm quite optimistic for the future maintainability. Also, please remind the discussion at the last developer meeting. The purpose of custom-scan (we didn't name it at that time) is to avoid unnecessary project branch for people who want to implement their own special feature but no facilities to enhance optimizer/executor are supported. Even though we have in-core parallel execution feature by CPU, it also makes sense to provide some unique implementation that may be suitable for a specific region. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers