This approach wouldn't work I think. The data keeps growing in the "hot" table.
On Tue, Feb 10, 2015 at 6:01 AM, Melvin Davidson <melvin6...@gmail.com> wrote: > Well, without knowing too much about your application, it certainly sounds > like using the metricts_YYYYMMDD is the way to go. As for modifying the > constraint daily, couldn't you just use > > where timestamp > current_date - Interval '1 Day' > > ? > > On Mon, Feb 9, 2015 at 5:14 AM, Marc Mamin <m.ma...@intershop.de> wrote: > >> >> >I have two partitioning questions I am hoping somebody can help me with. >> > >> >I have a fairly busy metric(ish) table. It gets a few million records >> per day, the data is transactional for a while but then settles down and is >> used for analytical purposes later. >> > >> >When a metric is reported both the UTC time and the local times are >> stored along with the other data belonging to the metric. >> >> Don't you have duplicate information within your UTC, location and >> local_time data ? >> Maybe you can just attach a timezone to each location... >> >> >I want to partition this table to both make it faster to query and also >> to spread out the writes. Ideally the partitions would be based on the UTC >> timestamp and the sending location. For example >> > >> >metrics_location_XXXXX_2015_01_01 >> > >> >First problem with this approach is that there could be tens of >> thousands of locations so this is going to result hundreds of thousands of >> tables. I know there are no upper limits to how many tables there are but >> I am thinking this might really get me into trouble later. >> >> With only a few millions rows per day, weekly or even monthly partitions >> without regard of locations should be sufficient for older data. >> It should be possible to partition your hot data differently; But Instead >> of using one partition per location, you may use a hash/modulo approach to >> keep the number of partitions in a reasonable count if required at all >> (This can be helpful: https://github.com/markokr/pghashlib). Here I >> would avoid to include time information except for the limit between old >> and hot tables. And depending on the pattern and performance requirement of >> your analytic queries this may be sufficient (i.e. don't partition on the >> time at all). >> With smaller partitions for hot data, it should be quite fast to move >> them one by one to the old data. I have no experience with the trigger >> based partitioning of Postgres (we handle partitions logically at the >> application level), so I'm not sure how difficult this approach is. I >> suppose that you'll need a function that move data from hot to old >> partitons and that fix the triggers accordingly. >> >> > >> >Second and possibly more vexing problem is that often the local time is >> queried. Ideally I would like to put three constraints on the child >> tables. Location id, UTC timestamp and the local time but obviously the >> local timestamps would overlap with other locations in the same timezone >> Even if I was to only partition by UTC the local timestamps would overlap >> between tables. >> > >> >So the questions in a nutshell are. >> > >> >1. Should I be worried about having possibly hundreds of thousands of >> shards. >> >2. Is PG smart enough to handle overlapping constraints on table and >> limit it's querying to only those tables that have the correct time >> constraint. >> >> If you partition on the UTC time only, you don't have overlapping. When >> querying on the local time, the planner will consider all partitions, but >> an additional index or constraint on this column should be sufficient as >> long as your partition count remains small. >> >> regards, >> Marc Mamin >> > > > > -- > *Melvin Davidson* > I reserve the right to fantasize. Whether or not you > wish to share my fantasy is entirely up to you. >