On Sat, Feb 4, 2017 at 5:54 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Feb 1, 2017 at 12:58 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> Yeah, I understand that point and I can see there is strong argument >> to do that way, but let's wait and see what others including Robert >> have to say about this point. > > It seems to me that you can make an argument for any point of view. > In a parallel sequential scan, the smallest unit of work that can be > given to one worker is one heap page; in a parallel index scan, it's > one index page. By that logic, as Rahila says, we ought to do this > based on the number of index pages. On the other hand, it's weird to > use the same GUC to measure index pages at some times and heap pages > at other times, and it could result in failing to engage parallelism > where we really should do so, or using an excessively small number of > workers. An index scan that hits 25 index pages could hit 1000 heap > pages; if it's OK to use a parallel sequential scan for a table with > 1000 heap pages, why is it not OK to use a parallel index scan to scan > 1000 heap pages? I can't think of any reason. >
I think one difference is that if we want to scan 1000 heap pages with parallel index scan, scanning index cost is additional as compare to parallel sequential scan. > On balance, I'm somewhat inclined to think that we ought to base > everything on heap pages, so that we're always measuring in the same > units. That's what Dilip's patch for parallel bitmap heap scan does, > and I think it's a reasonable choice. However, for parallel index > scan, we might want to also cap the number of workers to, say, > index_pages/10, just so we don't pick an index scan that's going to > result in a very lopsided work distribution. > I guess in the above context you mean heap_pages or index_pages that are expected to be *fetched* during index scan. Yet another thought is that for parallel index scan we use index_pages_fetched, but use either a different GUC (min_parallel_index_rel_size) with a relatively lower default value (say equal to min_parallel_relation_size/4 = 2MB) or directly use min_parallel_relation_size/4 for parallel index scans. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers