Re: [GENERAL] Large data and slow queries

2017-04-27 Thread Samuel Williams
Citus is also now just an extension. Sounds pretty awesome, I'll certainly consider your system if/when we decide to make changes. On 22 April 2017 at 08:41, Andrew Staller wrote: > Samuel, > > Short answer to your questions: (1) TimescaleDB and CitusDB are focusing on > solving different probl

Re: [GENERAL] Large data and slow queries

2017-04-21 Thread Andrew Staller
Samuel, Short answer to your questions: (1) TimescaleDB and CitusDB are focusing on solving different problems, and (2) TimescaleDB is an Apache 2-licensed extension to run in your Postgres database, not a fork or different system. Longer answer to your first question: >From what we've read and

Re: [GENERAL] Large data and slow queries

2017-04-20 Thread Samuel Williams
Scott, Vick, the vast majority of the data is generic. But there are some specific events we need to look up quickly which are probably less than a few 100,000 records. We did evaluate partial indexes vs full indexes. The partial index speeds up our specific queries significantly while only taking

Re: [GENERAL] Large data and slow queries

2017-04-20 Thread Samuel Williams
Andrew, how would timescaledb compare to citus - and is timescaledb an extension to postgres or is it an entirely separate system? On 21 April 2017 at 02:44, Andrew Staller wrote: > Awesome thread. > > Samuel, > > Just wanted you to be aware of the work we're doing at TimescaleDB > (http://www.ti

Re: [GENERAL] Large data and slow queries

2017-04-20 Thread Scott Marlowe
On Thu, Apr 20, 2017 at 6:30 AM, Vick Khera wrote: > I'm curious why you have so many partial indexes. Are you trying to make > custom indexes per query? It seems to me you might want to consider making > the indexes general, and remove the redundant ones (that have the same > prefix list of index

Re: [GENERAL] Large data and slow queries

2017-04-20 Thread Andrew Staller
Awesome thread. Samuel, Just wanted you to be aware of the work we're doing at TimescaleDB ( http://www.timescale.com/), a time-series database extension for PostgreSQL. Some of how we might help you: - automatic partitioning by space (primary key - like country_id, for instance) and time. This

Re: [GENERAL] Large data and slow queries

2017-04-20 Thread Vick Khera
I'm curious why you have so many partial indexes. Are you trying to make custom indexes per query? It seems to me you might want to consider making the indexes general, and remove the redundant ones (that have the same prefix list of indexed fields). Secondly your table is 102Gb. Clearly there's a

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Samuel Williams
Okay, so after changing longitude/latitude to float4, and re-organizing the table a bit, I got the query down from about 8 minutes to 40 seconds. The details are in the gist comments: https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121 Now, just need to get performance another 2 ord

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Alban Hertroys
> On 19 Apr 2017, at 12:58, Samuel Williams > wrote: > > It's interesting you talk about using multiple indexes. In > MySQL/MariaDB and derivatives, I've never seen the query planner > consider using multiple indexes. So, it's possible that Postgres may > use multiple indexes if it saves time?

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Samuel Williams
Ah right, yeah, it's insert only. So, it's never been vacuumed. On 20 April 2017 at 01:25, Stephen Frost wrote: > Greetings, > > * Samuel Williams (space.ship.travel...@gmail.com) wrote: >> Thanks for all the suggestions Stephen. >> >> > That explain analyze shows a whole ton of heap fetches. Wh

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Stephen Frost
Greetings, * Samuel Williams (space.ship.travel...@gmail.com) wrote: > Thanks for all the suggestions Stephen. > > > That explain analyze shows a whole ton of heap fetches. When was the > last time a VACUUM was run on this table, to build the visibility map? > Without the visibility map being cu

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Samuel Williams
Thanks for all the suggestions Stephen. > That explain analyze shows a whole ton of heap fetches. When was the last time a VACUUM was run on this table, to build the visibility map? Without the visibility map being current, an Index-Only Scan, as is happening here, can really suck. I'm using the

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Stephen Frost
Greetings, * Samuel Williams (space.ship.travel...@gmail.com) wrote: > We want the following kinds of query to be fast: "kinds of query" isn't helpful, you should be reviewing exactly the queries you care about because statistics and your exact data set and what the exact query you're running is

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Martijn Tonies (Upscene Productions)
Hello Vinny, Samuel, Perhaps I'm missing something, but I'd be interested in the reasoning behind this. For column 'what', it seems you have no index on all values, only indices with specific values for 'what'. How does this speed up the search? Will PostgreSQL use those indices, instead of u

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread vinny
On 2017-04-19 13:25, Martijn Tonies (Upscene Productions) wrote: Samuel, others, Perhaps I'm missing something, but I'd be interested in the reasoning behind this. For column 'what', it seems you have no index on all values, only indices with specific values for 'what'. How does this speed up

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Samuel Williams
Martijn that is a good question. It's because we are only concerned with a subset of events for this index and this particular query. The query planner can recognise this and use the index correctly. By doing this, we reduce the size of the index significantly. In the best case, where we only wante

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Martijn Tonies (Upscene Productions)
use those indices, instead of using a generic index on 'what' and optionally other columns? With regards, Martijn Tonies Upscene Productions http://www.upscene.com -Original Message- From: Samuel Williams Sent: Wednesday, April 19, 2017 6:01 AM To: pgsql-general Subject: [

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread vinny
On 2017-04-19 09:48, John R Pierce wrote: On 4/19/2017 12:31 AM, vinny wrote: Given the number of records, my first thought was either partitioning or partial-indexes. The fewer rows are in the index, the quicker it will be to check, and it's not a lot of work to create separate indexes for lat

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Samuel Williams
Thanks Alban, I appreciate your ideas and thoughts. I'm a little reluctant to go down the partitioning route as I think we'll probably end up sharding out horizontally using citus data in the near future and naive postgres partitioning may hamper that effort. It's interesting you talk about using

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Alban Hertroys
> On 19 Apr 2017, at 6:01, Samuel Williams > wrote: > > Hi. > > We have 400,000,000 records in a table (soon to be 800,000,000), here > is the schema (\d+) > > https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121 > > We want the following kinds of query to be fast: > > SELECT D

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Samuel Williams
> Did that 50% performance gain come from just the datatype, or that fact that > the index became smaller? How would one measure this? On 19 April 2017 at 19:48, John R Pierce wrote: > On 4/19/2017 12:31 AM, vinny wrote: >> >> Given the number of records, my first thought was either partitionin

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread John R Pierce
On 4/19/2017 12:31 AM, vinny wrote: Given the number of records, my first thought was either partitioning or partial-indexes. The fewer rows are in the index, the quicker it will be to check, and it's not a lot of work to create separate indexes for lat/long ranges or dates. that only works i

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread vinny
On 2017-04-19 07:04, Samuel Williams wrote: Thanks John. Yes, you are absolutely right, you want the index to be bottom heavy so you can cull as much as possible at the top. I'm familiar with that, once implementing a brute-force sudoku solver, it has the same principle. I've been working on thi

Re: [GENERAL] Large data and slow queries

2017-04-18 Thread Samuel Williams
Thanks John. Yes, you are absolutely right, you want the index to be bottom heavy so you can cull as much as possible at the top. I'm familiar with that, once implementing a brute-force sudoku solver, it has the same principle. I've been working on this all afternoon. By reducing the longitude, la

Re: [GENERAL] Large data and slow queries

2017-04-18 Thread John R Pierce
On 4/18/2017 9:01 PM, Samuel Williams wrote: We want the following kinds of query to be fast: SELECT ... AND (latitude > -37.03079375089291 AND latitude < -36.67086424910709 AND longitude > 174.6307139779924 AND longitude < 175.0805140220076); I wonder if GIST would work better if you use the

Re: [GENERAL] Large data and slow queries

2017-04-18 Thread Samuel Williams
Oh, I've also tried earth distance and ll_to_earth in a GIST index... it was slower that the BTREE index on a small subset of data in my tests. On 19 April 2017 at 16:01, Samuel Williams wrote: > Hi. > > We have 400,000,000 records in a table (soon to be 800,000,000), here > is the schema (\d+) >

[GENERAL] Large data and slow queries

2017-04-18 Thread Samuel Williams
Hi. We have 400,000,000 records in a table (soon to be 800,000,000), here is the schema (\d+) https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121 We want the following kinds of query to be fast: SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE "user_event"."what" IN