This approach wouldn't work I think.  The data keeps growing in the "hot"
table.

On Tue, Feb 10, 2015 at 6:01 AM, Melvin Davidson <melvin6...@gmail.com>
wrote:

> Well, without knowing too much about your application, it certainly sounds
> like using the metricts_YYYYMMDD is the way to go. As for modifying the
> constraint daily, couldn't you just use
>
> where timestamp > current_date - Interval '1 Day'
>
> ?
>
> On Mon, Feb 9, 2015 at 5:14 AM, Marc Mamin <m.ma...@intershop.de> wrote:
>
>>
>> >I have two partitioning questions I am hoping somebody can help me with.
>> >
>> >I have a fairly busy metric(ish) table. It gets a few million records
>> per day, the data is transactional for a while but then settles down and is
>> used for analytical purposes later.
>> >
>> >When a metric is reported both the UTC time and the local times are
>> stored along with the other data belonging to the metric.
>>
>> Don't you have duplicate information within your UTC, location and
>> local_time data ?
>> Maybe you can just attach a timezone to each location...
>>
>> >I want to partition this table to both make it faster to query and also
>> to spread out the writes.  Ideally the partitions would be based on the UTC
>> timestamp and the sending location. For example
>> >
>> >metrics_location_XXXXX_2015_01_01
>> >
>> >First problem with this approach is that there could be tens of
>> thousands of locations so this is going to result hundreds of thousands of
>> tables.   I know there are no upper limits to how many tables there are but
>> I am thinking this might really get me into trouble later.
>>
>> With only a few millions rows per day, weekly or even monthly partitions
>> without regard of locations should be sufficient for older data.
>> It should be possible to partition your hot data differently; But Instead
>> of using one partition per location, you may use a hash/modulo approach to
>> keep the number of partitions in a reasonable count if required at all
>> (This can be helpful: https://github.com/markokr/pghashlib). Here I
>> would avoid to include time information except for the limit between old
>> and hot tables. And depending on the pattern and performance requirement of
>> your analytic queries this may be sufficient (i.e. don't partition on the
>> time at all).
>> With smaller partitions for hot data, it should be quite fast to move
>> them one by one to the old data. I have no experience with the trigger
>> based partitioning of Postgres (we handle partitions logically at the
>> application level), so I'm not sure how difficult this approach is. I
>> suppose that you'll need a function that move data from hot to old
>> partitons and that fix the triggers accordingly.
>>
>> >
>> >Second and possibly more vexing problem is that often the local time is
>> queried.  Ideally I would like to put three constraints on the child
>> tables. Location id, UTC timestamp and the local time but obviously the
>> local timestamps would overlap with other locations in the same timezone
>> Even if I was to only partition by UTC the local timestamps would overlap
>> between tables.
>> >
>> >So the questions in a nutshell are.
>> >
>> >1. Should I be worried about having possibly hundreds of thousands of
>> shards.
>> >2. Is PG smart enough to handle overlapping constraints on table and
>> limit it's querying to only those tables that have the correct time
>> constraint.
>>
>> If you partition on the UTC time only, you don't have overlapping. When
>> querying on the local time, the planner will consider all partitions, but
>> an additional index or constraint on this column should be sufficient as
>> long as your partition count remains small.
>>
>> regards,
>> Marc Mamin
>>
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>

Reply via email to