2015-12-23 21:36 GMT+01:00 Daniel Verite <dan...@manitou-mail.org>: > Hi, > > Here's an updated patch that replaces sorted arrays by AVL binary trees > when gathering distinct values for the columns involved in the pivot. > The change is essential for large resultsets. For instance, > it allows to process a query like this (10 million rows x 10 columns): > > select x,(random()*10)::int, (random()*1000)::int from > generate_series(1,10000000) as x > \crosstabview > > which takes about 30 seconds to run and display on my machine with the > attached patch. That puts it seemingly in the same ballpark than > the equivalent test with the server-side crosstab(). > > With the previous iterations of the patch, this test would never end, > even with much smaller sets, as the execution time of the 1st step > grew exponentially with the number of distinct keys. > The exponential effect starts to be felt at about 10k values on my low-end > CPU, > and from there quickly becomes problematic. > > As a client-side display feature, processing millions of rows like in > the query above does not necessarily make sense, it's pushing the > envelope, but stalling way below 100k rows felt lame, so I'm happy to get > rid of that limitation. > > However, there is another one. The above example does not need or request > an additional sort step, but if it did, sorting more than 65535 entries in > the vertical header would error out, because values are shipped as > parameters to PQexecParams(), which only accepts that much. > To avoid the problem, when the rows in the output "grid" exceed 2^16 and > they need to be sorted, the user must let the sort being driven by ORDER > BY > beforehand in the query, knowing that the pivot will keep the original > ordering intact in the vertical header. > > I'm still thinking about extending this based on Pavel's diff for the > "label" column, so that > \crosstabview [+|-]colV[:colSortH] [+|-]colH[:colSortH] > would mean to use colV/H as grid headers but sort them according > to colSortV/H. > I prefer that syntax over adding more parameters, and also I'd like > to have it work in both V and H directions. >
This syntax is good - simple, readable Pavel > > Aside from the AVL trees, there are a few other minor changes in that > patch: > - move non-exportable structs from the .h to the .c > - move code in common.c to respect alphabetical ordering > - if vertical sort is requested, add explicit check against more than 65535 > params instead of letting the sort query fail > - report all failure cases of the sort query > - rename sortColumns to serverSort and use less the term "columns" in > comments and variables. > > > Best regards, > -- > Daniel Vérité > PostgreSQL-powered mailer: http://www.manitou-mail.org > Twitter: @DanielVerite > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >