On Fri, Oct 21, 2016 at 4:25 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Tue, Oct 18, 2016 at 3:48 AM, Peter Geoghegan <p...@heroku.com> wrote: >> On Mon, Oct 17, 2016 at 5:36 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> >> I read the following paragraph from the Volcano paper just now: >> >> """ >> During implementation and benchmarking of parallel sorting, we added >> two more features to exchange. First, we wanted to implement a merge >> network in which some processors produce sorted streams merge >> concurrently by other processors. Volcano’s sort iterator can be used >> to generate a sorted stream. A merge iterator was easily derived from >> the sort module. It uses a single level merge, instead of the cascaded >> merge of runs used in sort. The input of a merge iterator is an >> exchange. Differently from other operators, the merge iterator >> requires to distinguish the input records by their producer. As an >> example, for a join operation it does not matter where the input >> records were created, and all inputs can be accumulated in a single >> input stream. For a merge operation, it is crucial to distinguish the >> input records by their producer in order to merge multiple sorted >> streams correctly. >> """ >> >> I don't really understand this paragraph, but thought I'd ask: why the >> need to "distinguish the input records by their producer in order to >> merge multiple sorted streams correctly"? Isn't that talking about >> partitioning, where each workers *ownership* of a range matters? >> > > I think so, but it seems from above text that is mainly required for > merge iterator which probably will be used in merge join. > >> My >> patch doesn't care which values belong to which workers. And, it >> focuses quite a lot on dealing well with the memory bandwidth bound, >> I/O bound part of the sort where we write out the index itself, just >> by piggy-backing on tuplesort.c. I don't think that that's useful for >> a general-purpose executor node -- tuple-at-a-time processing when >> fetching from workers would kill performance. >> > > Right, but what is written in text quoted by you seems to be do-able > with tuple-at-a-time processing. >
To be clear, by saying above, I don't mean that we should try that approach instead of what you are proposing, but it is worth some discussion to see if that has any significant merits. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers