Re: Separate 100 M spatial data in 100 tables VS one big table

kimaidou Wed, 06 Mar 2024 08:13:42 -0800

Hi !

I would like to thank you all for your detailed answers and explanations.
I would give "partitioning" a try, by creating a dedicated new partition
table, and insert a (big enough) extract of the source data in it.


You are right, the best would be to try in real life !

Best wishes
Kimaidou

Le mardi 5 mars 2024, Tomas Vondra <tomas.von...@enterprisedb.com> a écrit :

> On 3/5/24 13:47, Marc Millas wrote:
> > Salut Kimaidou,
> > why not a partitioned table with the department a partitioning Key ?
> > each year just detach the obsolete data, department by
> > department (ie.detach the partition, almost instantaneous) and drop or
> keep
> > the obsolete data.
> > No delete, quite easy to maintain. For each global index, Postgres will
> > create one index per each partition. and detach them when you detach a
> > department partition.
> > so when importing, first create an appropriate table, load the data, and
> > attach it to the main partitioned table. Postgres will
> > automatically recreate all necessary indexes.
> >
>
> Yes, a table partitioned like this is certainly a valid option - and
> it's much better than the view with a UNION of all the per-department
> tables. The optimizer has very little insight into the view, which
> limits how it can optimize queries. For example if the query has a
> condition like
>
>    WHERE department = 'X'
>
> with the declarative partitioning the planner can eliminate all other
> partitions (and just ignore them), while with the view it will have to
> scan all of them.
>
> But is partitioning a good choice? Who knows - it makes some operations
> simpler (e.g. you can detach/drop a partition instead of deleting the
> rows), but it also makes other operations less efficient. For example a
> query that can't eliminate partitions has to do more stuff during
> execution.
>
> So to answer this we'd need to know how often stuff like bulk deletes /
> reloads happen, what queries will be executed, and so on. Both options
> (non-partitioned and partitioned table) are valid, but you have to try.
>
> Also, partitioned table may not support / allow some features - for
> example unique keys that don't contain the partition key. We're
> improving this in every release, but there will always be a gap.
>
> I personally would start with non-partitioned table, because that's the
> simplest option. And once I get a better idea how often the reloads
> happen, I'd consider if that's something worth the extra complexity of
> partitioning the data. If it happens only occasionally (a couple times a
> year), it probably is not. You'll just delete the data and reuse the
> space for new data.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: Separate 100 M spatial data in 100 tables VS one big table

Reply via email to