I wrote: > ... the leaf tuple datatype is hard-wired to be > the same as the indexed column's type. Why is that? It seems to me > to be both confusing and restrictive. For instance, if you'd designed > the suffix tree opclass just a little differently, it would be wanting > to store "char" not text in leaf nodes. Shouldn't we change this to > allow the leaf data type to be specified explicitly?
After another day's worth of hacking, I now understand the reason for the above: when an index is less than a page and an incoming value would still fit on the root page, the incoming value is simply dumped into a leaf tuple without ever calling any opclass-specific function at all. To allow the leaf datatype to be different from the indexed column, we'd need at least one more opclass support function, and it's not clear that the potential gain is worth any extra complexity. However, I now have another question: what is the point of the allTheSame mechanism? It seems to add quite a great deal of complexity, without saving much of any space. At least for the node key types used so far, the null bitmap that's added to the node tuples eats just as much space as storing a normal key would. We could probably avoid that by using custom tuple construction code instead of index_form_tuple, but on the whole I think it'd be better to remove the concept. For one thing, it's giving me fits while attempting to fix the limitation on storing long indexed values. There's no reason why a suffix tree representation shouldn't work for long strings, but you have to be willing to cap the length of any given inner tuple's prefix to something that will fit on a page --- and that breaks the badly-underdocumented allTheSame logic in spgtextproc.c. You can't choose to just drop the node key (i.e., the next byte in the original string) unless there isn't any because you reached the end of the string. In general it's not clear to me why it's sensible to drop a node key ever. I'm also still wondering what your thoughts are on storing null values versus full-index-scan capability. I'm leaning towards getting rid of the dead code, but if you have an idea how to remove the limitation, maybe we should do that instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers