Hi, On 2019-05-15 12:55:33 -0400, Korry Douglas wrote: > Hi all, I’m working on an FDW that would benefit greatly from parallel > foreign scan. I have implemented the callbacks described > here:https://www.postgresql.org/docs/devel/fdw-callbacks.html#FDW-CALLBACKS-PARALLEL. > and I see a big improvement in certain plans. > > My problem is that I can’t seem to get a parallel foreign scan in a query > that does not contain an aggregate. > > For example: > SELECT count(*) FROM foreign table; > Gives me a parallel scan, but > SELECT * FROM foreign table; > Does not.
Well, that'd be bound by the cost of transferring tuples between workers and leader. You don't get, unless you fiddle heavily with the cost, a parallel scan for the equivalent local table scan either. You can probably force the planner's hand by setting parallel_setup_cost, parallel_tuple_cost very low - but it's unlikely to be beneficial. If you added a where clause that needs to be evaluated outside the FDW, you'd probably see parallel scans without fiddling with the costs. > A second related question - how can I find the actual number of > workers chose for my ForeignScan? At the moment, I looking at > ParallelContext->nworkers (inside of the InitializeDSMForeignScan() > callback) because that seems to be the first callback function that > might provide the worker count. I need the *actual* worker count in > order to evenly distribute my workload. I can’t use the usual trick > of having each worker grab the next available chunk (because I have to > avoid seek operations on compressed data). In other words, it is of > great advantage for each worker to read contiguous chunks of data - > seeking to another part of the file is prohibitively expensive. Don't think - but am not sure - that there's a nicer way currently. Although I'd use nworkers_launched, rather than nworkers. Greetings, Andres Freund