On Sat, Mar 21, 2015 at 2:58 AM, Andrew Gierth <and...@tao11.riddles.org.uk> wrote: > Peter> I don't really buy it, either way. In what sense is a NULL value > Peter> ever abbreviated? It isn't. Whatever about the cost model, > Peter> that's the truth of the matter. There is always going to be a > Peter> sort of tension in any cost model, between whether or not it's > Peter> worth making it more sophisticated, and the extent to which > Peter> tweaking the model is chasing diminishing returns. > > Comparisons between nulls and nulls, or between nulls and non-nulls, are > cheap; only comparisons between non-nulls and non-nulls can be > expensive. > > The purpose of abbreviation is to replace expensive comparisons by cheap > ones where possible, and therefore the cost model used for abbreviation > should ignore nulls entirely; all that matters is the number of non-null > values and the probability of saving time by abbreviating them. > > So if you're sorting a million rows of which 900,000 are null and > 100,000 contain 50 different non-null values, then the absolute time > saved (not the proportion) by doing abbreviation should be on the same > order as the absolute time saved by abbreviation when sorting just the > 100,000 non-null rows. > > But if the cost model does 1,000,000/50 and gets 20,000, and decides > "that's worse than my 1 in 10,000 target, I'll abort abbreviations", > then you have sacrificed the time gain for no reason at all. This is > what I mean by "spurious". This is why the cost model must compute the > fraction as 100,000/50, ignoring the null inputs, if it's going to > perform anything like optimally in the presence of nulls.
I think Andrew is right. > Peter> I also think that your explanation of the encoding schemes was > Peter> perfunctory. > > I'm interested in other opinions on that, because I find your > replacement for it both confusingly organized and a bit misleading (for > example saying the top bit is "wasted" is wrong, it's reserved because > we need it free for the sign). > > (It is true that mine assumes that the reader knows what "excess-N" > means, or can find out.) > > Here's mine, which is given as a single block comment: > > [ long explanatory comment ] > > Peter's, inline with the code (omitted here): > > [ long explanatory comment ] In my opinion, Andrew's version is far clearer. Peter's version is full of jargon that I can't understand. I could probably figure it out with a few hours and a search engine, but that really shouldn't be necessary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers