Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

Gavin Sherry Fri, 11 Jan 2008 11:00:27 -0800

On Fri, Jan 11, 2008 at 11:49:50AM +0000, Simon Riggs wrote:
> On Fri, 2008-01-11 at 10:25 +0100, Gavin Sherry wrote:
> > > 
> > > Of course. It's an identical situation for both. Regrettably, none of
> > > your comments about dynamic partitioning and planning were accurate as a
> > > result. 
> > 
> > That's not true. We will still have planning drive the partition
> > selection when the predicate is immutable, thus having more accurate
> > plans. 
> 
> Not really.
> 
> The planner already evaluates stable functions at plan time to estimate
> selectivity against statistics. It can do the same here.
> 
> The boundary values can't be completely trusted at plan time because
> they are dynamic, but they're at least as accurate as ANALYZE statistics
> (and probably derived at identical times), so can be used as estimates.
> So I don't see any reason to imagine the plans will be badly adrift.


Okay, it's good that you want the planner to look at those. Did you
consider the point I made about the sheer amount of data the planner
would have to consider for large cases?

> We're back to saying that if the visibility map is volatile, then SE
> won't help you much. I agree with that and haven't argued otherwise.
> Does saying it make us throw away SE? No, at least, not yet and not for
> that reason.

Yes, I'm not against SE I just think that only having it would see a
serious regression for larger user. Personally, I think SE would be a
great idea for append only tables since it removes the thing I'm most
worried about with it: the need to vacuum to 'turn it on'.

> 
> SE does what I was looking for it to do, but doesn't do all of what
> you'd like to achieve with partitioning, because we're looking at
> different use cases. I'm sure you'd agree that all large databases are
> not the same and that they can have very different requirements. I'd
> characterise our recent positions on this that I've been focused on
> archival requirements, whereas you've been focused on data warehousing.

I think that sums it up, although I'd also say that declarative
partitioning is suitable for all those with largish amounts of data
which they know how they want stored. This points to another case that
SE suits: those who don't know how or (maybe more importantly) don't
care to manage their data.

I'll go back to what I said above. SE looks like a good performance
boost for archival read only data. If we tighten up the definitions of
how some tables can be used -- append only -- then we can remove the
vacuum requirement and also change other characteristics of the storage.
For example, reduced visibilty information, compression, etc. These are
hot topics for people with that kind of data.

> The difference really lies in how much activity and of what kind occurs
> on a table. You don't unload and reload archives regularly, nor do you

I couldn't agree more.

> perform random updates against the whole table. I'm sure we'd quickly
> agree that many of the challenges you've seen recently at Greenplum
> would not be possible with core Postgres, so I'm not really trying too
> hard to compete.

There we diverge. Yes, Greenplum produces systems for very large amounts
of data, peta-byte range in fact. However, the architecture --
individual post-masters on a single CPU with their own storage -- means
that we see the problems of non-distributed database users when we look
at data at the node level. This is why I say that VACUUMing such
systems, under the SE model, after you do a load of data is just
impossible (I know that after time the cost of vacuum will stabilise but
there's always the initial data load).

Thanks,

Gavin

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

Reply via email to