On Tue, May 3, 2011 at 3:06 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > I wrote: >> Alvaro Herrera <alvhe...@commandprompt.com> writes: >>> The interesting discussion is what happens next. To me, this is all >>> related to this previous discussion: >>> http://archives.postgresql.org/pgsql-hackers/2010-09/msg00232.php > >> Yeah, there doesn't seem like much point unless we have a clear idea >> what we're going to do with the change. > > BTW, it occurs to me to wonder whether, instead of making types be more > or less preferred, we should attack the issue from a different direction > and assign preferred-ness ratings to casts. That seems to be more or > less the direction that Robert was considering in the above-linked > thread. I'm not sure it's better than putting the ratings on types --- > in particular, neither viewpoint seems to offer a really clean answer > about what to do when trying to resolve a multiple-argument function > in which one possible resolution offers a more-preferred conversion for > one argument but a less-preferred conversion for another one. But it's > an alternative we ought to think about before betting all the chips on > generalizing typispreferred. > > Personally I've always felt that the typispreferred mechanism was a bit > of a wart; changing it from a bool to an int won't improve that, it'll > just make it a more complicated wart. Casts have already got a > standards-blessed notion that some are more equal than others, so > maybe attaching preferredness ratings to them will be less of a wart. > Not sure about it though.
I think this is a pretty good analysis. One of the big, fat problems with typispreferred is that it totally falls apart when more than two types are involved. For example, given a call f(int2), we can't decide between f(int4) and f(int8), but it seems pretty clear (to me, at least) that we should prefer to promote as little as possible and should therefore pick f(int4). The problem is less acute with string-like data types because there are only two typcategory-S data types that get much use: text and varchar. But add a third type to the mix (varchar2...) or start playing around with functions that are defined for name and bpchar but not text or some such thing, and things get sticky. Generalizing typispreferred to an integer definitely helps with these cases, assuming anyway that you are dealing mostly with built-in types, or that the extensions you are using can somehow agree among themselves on reasonable weighting values. But it is not a perfect solution either, because it can really only handle pretty linear topologies. It's reasonable to suppose that the integer types are ordered int2 - int4 - int8 - numeric and that the floating point types are ordered float4 - float8 (- numeric?), but I think the two hierarchies are pretty much incomparable, and an integer typispreferred won't handle that very well, unless we make the two groups separate categories, but arguably numeric belongs in both groups so that doesn't really seem to work very well either. Certainly from a theoretical perspective there's no reason why you couldn't have A - B - X and C - D - X, with A-C, A-D, B-C, and B-D incomparable. It almost feels like you need a graph to model it properly, which perhaps argues for your idea of attaching weights to the casts. But there are some problems with that, too. In particular, it would be nice to be able to "hook in" new types with a minimum of fuss. For example, say we add a new string type, like citext, via an extension. Right now, we need to add casts not only from citext to text, but also from citext to all the things to which text has casts, if we really want citext to behave like text. That solution works OK for the first extension type we load in, but as soon as you add any nonstandard casts from text to other things (perhaps yet another extension type of some kind), it starts to get a bit leaky. In some sense it feels like it'd be nice to be able to "walk the graph" - if an implicit cast from A to B is OK, and an implicit cast from B to C is OK, perhaps an implicit cast from A to C is also OK. But that seems awfully expensive to do at runtime, and it'd introduce some strange behavior particularly with the way we have the reg* -> oid and oid -> reg* casts set up. select a.castsource::regtype, a.casttarget::regtype, b.casttarget::regtype from pg_cast a, pg_cast b where a.casttarget = b.castsource and a.castcontext = 'i' and b.castcontext = 'i' and not exists (select 1 from pg_cast x where x.castsource = a.castsource and x.casttarget = b.casttarget and x.castcontext = 'i') and a.castsource <> b.casttarget; It's not clear to me whether in any of this there is a solution to the problem of int2 being a second-class citizen. Perhaps we could add casts from int4 and int8 back to int2, and make it less-preferred than all of the other integer types, but I'm not sure what else that would break. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers