Re: [GENERAL] Partioning with overlapping and non overlapping constraints

Brent Wood Mon, 09 Feb 2015 13:18:03 -0800

Hi Mark,

We have a somewhat similar situation - we have years of continuous data which 
are managed in Postgis. The tables are partitioned into annual subsets. The 
previous (static) years' underlying tables have a clustered index on UTC 
timestamp, the current year table has a conventional index. It works well, with 
20 odd partitions to date.

An annual basis for partitions may not be ideal in your case, but you have not 
specified how long it takes for your data to become fixed - or if there is a 
way the database can tell that records are now static. If there is, a scheduled 
task which migrates such records from a live to fixed partition would perhaps 
be appropriate.

Organising your data by UTC timestamp may be the simplest approach for you.

Cheers

Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI:  +64 (4) 3860529
________________________________________
From: pgsql-general-ow...@postgresql.org [pgsql-general-ow...@postgresql.org] 
on behalf of Melvin Davidson [melvin6...@gmail.com]
Sent: Tuesday, February 10, 2015 6:01 AM
To: Marc Mamin
Cc: Tim Uckun; pgsql-general
Subject: Re: [GENERAL] Partioning with overlapping and non overlapping 
constraints

Well, without knowing too much about your application, it certainly sounds like 
using the metricts_YYYYMMDD is the way to go. As for modifying the constraint 
daily, couldn't you just use

where timestamp > current_date - Interval '1 Day'

?

On Mon, Feb 9, 2015 at 5:14 AM, Marc Mamin 
<m.ma...@intershop.de<mailto:m.ma...@intershop.de>> wrote:

>I have two partitioning questions I am hoping somebody can help me with.
>
>I have a fairly busy metric(ish) table. It gets a few million records per day, 
>the data is transactional for a while but then settles down and is used for 
>analytical purposes later.
>
>When a metric is reported both the UTC time and the local times are stored 
>along with the other data belonging to the metric.

Don't you have duplicate information within your UTC, location and local_time 
data ?
Maybe you can just attach a timezone to each location...

>I want to partition this table to both make it faster to query and also to 
>spread out the writes.  Ideally the partitions would be based on the UTC 
>timestamp and the sending location. For example
>
>metrics_location_XXXXX_2015_01_01
>
>First problem with this approach is that there could be tens of thousands of 
>locations so this is going to result hundreds of thousands of tables.   I know 
>there are no upper limits to how many tables there are but I am thinking this 
>might really get me into trouble later.

With only a few millions rows per day, weekly or even monthly partitions 
without regard of locations should be sufficient for older data.
It should be possible to partition your hot data differently; But Instead of 
using one partition per location, you may use a hash/modulo approach to keep 
the number of partitions in a reasonable count if required at all (This can be 
helpful: https://github.com/markokr/pghashlib). Here I would avoid to include 
time information except for the limit between old and hot tables. And depending 
on the pattern and performance requirement of your analytic queries this may be 
sufficient (i.e. don't partition on the time at all).
With smaller partitions for hot data, it should be quite fast to move them one 
by one to the old data. I have no experience with the trigger based 
partitioning of Postgres (we handle partitions logically at the application 
level), so I'm not sure how difficult this approach is. I suppose that you'll 
need a function that move data from hot to old partitons and that fix the 
triggers accordingly.

>
>Second and possibly more vexing problem is that often the local time is 
>queried.  Ideally I would like to put three constraints on the child tables. 
>Location id, UTC timestamp and the local time but obviously the local 
>timestamps would overlap with other locations in the same timezone  Even if I 
>was to only partition by UTC the local timestamps would overlap between tables.
>
>So the questions in a nutshell are.
>
>1. Should I be worried about having possibly hundreds of thousands of shards.
>2. Is PG smart enough to handle overlapping constraints on table and limit 
>it's querying to only those tables that have the correct time constraint.

If you partition on the UTC time only, you don't have overlapping. When 
querying on the local time, the planner will consider all partitions, but an 
additional index or constraint on this column should be sufficient as long as 
your partition count remains small.

regards,
Marc Mamin

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you. 
[http://us.i1.yimg.com/us.yimg.com/i/mesg/tsmileys2/01.gif]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Partioning with overlapping and non overlapping constraints

Reply via email to