On Thu, Nov 21, 2019 at 9:15 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Nov 21, 2019 at 6:53 AM Masahiko Sawada > <masahiko.saw...@2ndquadrant.com> wrote: > > > > On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada > > > <masahiko.saw...@2ndquadrant.com> wrote: > > > > > > > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapil...@gmail.com> > > > > wrote: > > > > > > > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > > > > <masahiko.saw...@2ndquadrant.com> wrote: > > > > > > > > > > > > I've attached the latest version patch set. The patch set includes > > > > > > all > > > > > > discussed points regarding index AM options as well as shared cost > > > > > > balance. Also I added some test cases used all types of index AM. > > > > > > > > > > > > During developments I had one concern about the number of parallel > > > > > > workers to launch. In current design each index AMs can choose the > > > > > > participation of parallel bulk-deletion and parallel cleanup. That > > > > > > also means the number of parallel worker to launch might be > > > > > > different > > > > > > for each time of parallel bulk-deletion and parallel cleanup. In > > > > > > current patch the leader will always launch the number of indexes > > > > > > that > > > > > > support either one but it would not be efficient in some cases. For > > > > > > example, if we have 3 indexes supporting only parallel bulk-deletion > > > > > > and 2 indexes supporting only parallel index cleanup, we would > > > > > > launch > > > > > > 5 workers for each execution but some workers will do nothing at > > > > > > all. > > > > > > To deal with this problem, I wonder if we can improve the parallel > > > > > > query so that the leader process creates a parallel context with the > > > > > > maximum number of indexes and can launch a part of workers instead > > > > > > of > > > > > > all of them. > > > > > > > > > > > > > > > > Can't we choose the number of workers as a maximum of > > > > > "num_of_indexes_that_support_bulk_del" and > > > > > "num_of_indexes_that_support_cleanup"? If we can do that, then we can > > > > > always launch the required number of workers for each phase (bulk_del, > > > > > cleanup). In your above example, it should choose 3 workers while > > > > > creating a parallel context. Do you see any problem with that? > > > > > > > > I might be missing something but if we create the parallel context > > > > with 3 workers the leader process always launches 3 workers. Therefore > > > > in the above case it launches 3 workers even in cleanup although 2 > > > > workers is enough. > > > > > > > > > > Right, so we can either extend parallel API to launch fewer workers > > > than it has in parallel context as suggested by you or we can use > > > separate parallel context for each phase. Going with the earlier has > > > the benefit that we don't need to recreate the parallel context and > > > the latter has the advantage that we won't keep additional shared > > > memory allocated. > > > > I also thought to use separate parallel contexts for each phase but > > can the same DSM be used by parallel workers who initiated from > > different parallel contexts? If not I think that doesn't work because > > the parallel vacuum needs to set data to DSM of ambulkdelete and then > > parallel workers for amvacuumcleanup needs to access it. > > > > We can probably copy the stats in local memory instead of pointing it > to dsm after bulk-deletion, but I think that would unnecessary > overhead and doesn't sound like a good idea.
I agree that it will be unnecessary overhead. > > > > BTW, what kind of API change you have in mind for > > > the approach you are suggesting? > > > > I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n), > > where n is the number of workers the caller wants to launch and should > > be lower than the value in the parallel context. > > > > For that won't you need to duplicate most of the code of > LaunchParallelWorkers or maybe move the entire code in > LaunchParallelNWorkers and then LaunchParallelWorkers can also call > it. Another idea could be to just extend the existing API > LaunchParallelWorkers to take input parameter as the number of > workers, do you see any problem with that or is there a reason you > prefer to write a new API for this? I think we can pass an extra parameter to LaunchParallelWorkers therein we can try to launch min(pcxt->nworkers, n). Or we can put an assert (n <= pcxt->nworkers). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com