On Thu, May 16, 2019 at 5:46 AM Korry Douglas <ko...@me.com> wrote: > >> But, nworkers_launched is always set to 0 in > >> InitializeDSMForeignScan(), so that won’t work. Any other ideas? > > > > At that state it's simply not yet known how many workers will be > > actually launched (they might not start successfully or such). Why do > > you need to know it there and not later? > > > > - Andres > > I need to know at some point *before* I actually start scanning. The > ParallelContext pointer is only available in EstimateDSMForeignScan(), > InitializeDSMForeignScan(), and ReInitializeDSMForeignScan().
Hi Korry, That's only a superficial problem. You don't even know if or when the workers that are launched will all finish up running your particular node, because (for example) they might be sent to different children of a Parallel Append node above you (AFAICS there is no way for a participant to indicate "I've finished all the work allocated to me, but I happen to know that some other worker #3 is needed here" -- as soon as any participant reports that it has executed the plan to completion, pa_finished[] will prevent new workers from picking that node to execute). Suppose we made a rule that *every* worker must visit *every* partial child of a Parallel Append and run it to completion (and any similar node in the future must do the same)... then I think there is still a higher level design problem: if you do allocate work up front rather than on demand, then work could be unevenly distributed, and parallel query would be weakened. So I think you ideally need a simple get-next-chunk work allocator (like Parallel Seq Scan and like the file_fdw patch I posted[1]), or a pass-the-baton work allocator when there is a dependency between chunks (like Parallel Index Scan for btrees), or a more complicated multi-phase system that counts participants arriving and joining in (like Parallel Hash) so that participants can coordinate and wait for each other in controlled circumstances. If this compressed data doesn't have natural chunks designed for this purpose (like, say, ORC stripes), perhaps you could have a dedicated workers streaming data (compressed? decompressed?) into shared memory, and parallel query participants coordinating to consume chunks of that? [1] https://www.postgresql.org/message-id/CA%2BhUKG%2BqK3E2RF75PKfsV0sn2s018%2Bft--hUuCmd2R_yQ9tmPQ%40mail.gmail.com -- Thomas Munro https://enterprisedb.com