On Wed, Jan 8, 2020 at 2:38 AM Heikki Linnakangas <hlinn...@iki.fi> wrote:
> This makes the assumption that all Aggrefs or GroupingFuncs are at the > top of the TargetEntry. That's not true, e.g.: > > select 0+sum(a) from foo group by b; > > I think find_aggregated_cols() and find_unaggregated_cols() should be > merged into one function that scans the targetlist once, and returns two > Bitmapsets. They're always used together, anyway. > > So, I've attached a patch that does what Heikki recommended and gets both aggregated and unaggregated columns in two different bitmapsets. I think it works for more cases than the other patch. I'm not sure it is the ideal interface, but, since there aren't many consumers, I don't know. Also, it needs some formatting/improved naming/etc. Per Jeff's comment in [1] I started looking into using the scanCols patch from the thread on extracting scanCols from PlannerInfo [2] to get the aggregated and unaggregated columns for this patch. Since we only make one bitmap for scanCols containing all of the columns that need to be scanned, there is no context about where the columns came from in the query. That is, once the bit is set in the bitmapset, we have no way of knowing if that column was needed for aggregation or if it is filtered out immediately. We could solve this by creating multiple bitmaps at the time that we create the scanCols field -- one for aggregated columns, one for unaggregated columns, and, potentially more if useful to other consumers. The initial problem with this is that we extract scanCols from the PlannerInfo->simple_rel_array and PlannerInfo->simple_rte_array. If we wanted more context about where those columns were from in the query, we would have to either change how we construct the scanCols or construct them early and add to the bitmap when adding columns to the simple_rel_array and simple_rte_array (which, I suppose, is the same thing as changing how we construct scanCols). This might decentralize the code for the benefit of one consumer. Also, looping through the simple_rel_array and simple_rte_array a couple times per query seems like it would add negligible overhead. I'm more hesitant to add code that, most likely, would involve a walker to the codepath everybody uses if only agg will leverage the two distinct bitmapsets. Overall, I think it seems like a good idea to leverage scanCols for determining what columns hashagg needs to spill, but I can't think of a way of doing it that doesn't seem bad. scanCols are currently just that -- columns that will need to be scanned. [1] https://www.postgresql.org/message-id/e5566f7def33a9e9fdff337cca32d07155d7b635.camel%40j-davis.com [2] https://www.postgresql.org/message-id/flat/CAAKRu_Yj%3DQ_ZxiGX%2BpgstNWMbUJApEJX-imvAEwryCk5SLUebg%40mail.gmail.com -- Melanie Plageman
v1-0001-aggregated-unaggregated-cols-together.patch
Description: Binary data