On 27/05/10 13:37, Mark Kirkwood wrote:
On 25/05/10 16:43, Mark Kirkwood wrote:
Today I ran into some interesting consequences of the xml data type
being without an "=" operator. One I thought I'd post here because it
has a *possible* planner impact. I'm not sure it is actually a bug as
such, but this seemed the best forum to post in initially:
test=# \d bug
Table "public.bug"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
val | xml |
test=# explain select val::text from bug;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on bug (cost=0.00..58127.78 rows=1000278 width=32)
Note the width estimate. However a more realistic estimate for width is:
test=# select 8192/(reltuples/relpages) as width from pg_class where
relname='bug';
width
------------------
394.130431739976
So we are going to massively underestimate the "size" of such a
dataset. Now this appears to be a consequence of no "=" operator
(std_typanalyze in analyze.c bails if there isn't one), so the
planner has no idea about how wide 'val' actually is. I'm wondering
if it is worth having at least an "=" operator to enable some minimal
stats to be available for xml columns.
Adding a minimal = op (see attached) and an analyze results in:
test=# explain select val::text from bug;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on bug (cost=0.00..62632.08 rows=1000008 width=385)
which gives a much better indication of dataset size.
Maybe I gave this guy a bad title - is it a concern that the 'width'
estimate is so far off for xml datatypes (because of no = op)? It seemed
to me that this could result in some bad plan choices (e.g in subqueries
etc).
regards
Mark