Well, without knowing too much about your application, it certainly sounds like using the metricts_YYYYMMDD is the way to go. As for modifying the constraint daily, couldn't you just use
where timestamp > current_date - Interval '1 Day' ? On Mon, Feb 9, 2015 at 5:14 AM, Marc Mamin <m.ma...@intershop.de> wrote: > > >I have two partitioning questions I am hoping somebody can help me with. > > > >I have a fairly busy metric(ish) table. It gets a few million records per > day, the data is transactional for a while but then settles down and is > used for analytical purposes later. > > > >When a metric is reported both the UTC time and the local times are > stored along with the other data belonging to the metric. > > Don't you have duplicate information within your UTC, location and > local_time data ? > Maybe you can just attach a timezone to each location... > > >I want to partition this table to both make it faster to query and also > to spread out the writes. Ideally the partitions would be based on the UTC > timestamp and the sending location. For example > > > >metrics_location_XXXXX_2015_01_01 > > > >First problem with this approach is that there could be tens of thousands > of locations so this is going to result hundreds of thousands of tables. > I know there are no upper limits to how many tables there are but I am > thinking this might really get me into trouble later. > > With only a few millions rows per day, weekly or even monthly partitions > without regard of locations should be sufficient for older data. > It should be possible to partition your hot data differently; But Instead > of using one partition per location, you may use a hash/modulo approach to > keep the number of partitions in a reasonable count if required at all > (This can be helpful: https://github.com/markokr/pghashlib). Here I would > avoid to include time information except for the limit between old and hot > tables. And depending on the pattern and performance requirement of your > analytic queries this may be sufficient (i.e. don't partition on the time > at all). > With smaller partitions for hot data, it should be quite fast to move them > one by one to the old data. I have no experience with the trigger based > partitioning of Postgres (we handle partitions logically at the application > level), so I'm not sure how difficult this approach is. I suppose that > you'll need a function that move data from hot to old partitons and that > fix the triggers accordingly. > > > > >Second and possibly more vexing problem is that often the local time is > queried. Ideally I would like to put three constraints on the child > tables. Location id, UTC timestamp and the local time but obviously the > local timestamps would overlap with other locations in the same timezone > Even if I was to only partition by UTC the local timestamps would overlap > between tables. > > > >So the questions in a nutshell are. > > > >1. Should I be worried about having possibly hundreds of thousands of > shards. > >2. Is PG smart enough to handle overlapping constraints on table and > limit it's querying to only those tables that have the correct time > constraint. > > If you partition on the UTC time only, you don't have overlapping. When > querying on the local time, the planner will consider all partitions, but > an additional index or constraint on this column should be sufficient as > long as your partition count remains small. > > regards, > Marc Mamin > -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.