Re: [GENERAL] Partioning with overlapping and non overlapping constraints

Melvin Davidson Mon, 09 Feb 2015 09:03:22 -0800

Well, without knowing too much about your application, it certainly sounds
like using the metricts_YYYYMMDD is the way to go. As for modifying the
constraint daily, couldn't you just use


where timestamp > current_date - Interval '1 Day'

?

On Mon, Feb 9, 2015 at 5:14 AM, Marc Mamin <m.ma...@intershop.de> wrote:

>
> >I have two partitioning questions I am hoping somebody can help me with.
> >
> >I have a fairly busy metric(ish) table. It gets a few million records per
> day, the data is transactional for a while but then settles down and is
> used for analytical purposes later.
> >
> >When a metric is reported both the UTC time and the local times are
> stored along with the other data belonging to the metric.
>
> Don't you have duplicate information within your UTC, location and
> local_time data ?
> Maybe you can just attach a timezone to each location...
>
> >I want to partition this table to both make it faster to query and also
> to spread out the writes.  Ideally the partitions would be based on the UTC
> timestamp and the sending location. For example
> >
> >metrics_location_XXXXX_2015_01_01
> >
> >First problem with this approach is that there could be tens of thousands
> of locations so this is going to result hundreds of thousands of tables.
> I know there are no upper limits to how many tables there are but I am
> thinking this might really get me into trouble later.
>
> With only a few millions rows per day, weekly or even monthly partitions
> without regard of locations should be sufficient for older data.
> It should be possible to partition your hot data differently; But Instead
> of using one partition per location, you may use a hash/modulo approach to
> keep the number of partitions in a reasonable count if required at all
> (This can be helpful: https://github.com/markokr/pghashlib). Here I would
> avoid to include time information except for the limit between old and hot
> tables. And depending on the pattern and performance requirement of your
> analytic queries this may be sufficient (i.e. don't partition on the time
> at all).
> With smaller partitions for hot data, it should be quite fast to move them
> one by one to the old data. I have no experience with the trigger based
> partitioning of Postgres (we handle partitions logically at the application
> level), so I'm not sure how difficult this approach is. I suppose that
> you'll need a function that move data from hot to old partitons and that
> fix the triggers accordingly.
>
> >
> >Second and possibly more vexing problem is that often the local time is
> queried.  Ideally I would like to put three constraints on the child
> tables. Location id, UTC timestamp and the local time but obviously the
> local timestamps would overlap with other locations in the same timezone
> Even if I was to only partition by UTC the local timestamps would overlap
> between tables.
> >
> >So the questions in a nutshell are.
> >
> >1. Should I be worried about having possibly hundreds of thousands of
> shards.
> >2. Is PG smart enough to handle overlapping constraints on table and
> limit it's querying to only those tables that have the correct time
> constraint.
>
> If you partition on the UTC time only, you don't have overlapping. When
> querying on the local time, the planner will consider all partitions, but
> an additional index or constraint on this column should be sufficient as
> long as your partition count remains small.
>
> regards,
> Marc Mamin
>



-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] Partioning with overlapping and non overlapping constraints

Reply via email to