Citus is also now just an extension.
Sounds pretty awesome, I'll certainly consider your system if/when we
decide to make changes.
On 22 April 2017 at 08:41, Andrew Staller wrote:
> Samuel,
>
> Short answer to your questions: (1) TimescaleDB and CitusDB are focusing on
> solving different probl
Samuel,
Short answer to your questions: (1) TimescaleDB and CitusDB are focusing
on solving different problems, and (2) TimescaleDB is an Apache 2-licensed
extension to run in your Postgres database, not a fork or different system.
Longer answer to your first question:
>From what we've read and
Scott, Vick, the vast majority of the data is generic. But there are
some specific events we need to look up quickly which are probably
less than a few 100,000 records. We did evaluate partial indexes vs
full indexes. The partial index speeds up our specific queries
significantly while only taking
Andrew, how would timescaledb compare to citus - and is timescaledb an
extension to postgres or is it an entirely separate system?
On 21 April 2017 at 02:44, Andrew Staller wrote:
> Awesome thread.
>
> Samuel,
>
> Just wanted you to be aware of the work we're doing at TimescaleDB
> (http://www.ti
On Thu, Apr 20, 2017 at 6:30 AM, Vick Khera wrote:
> I'm curious why you have so many partial indexes. Are you trying to make
> custom indexes per query? It seems to me you might want to consider making
> the indexes general, and remove the redundant ones (that have the same
> prefix list of index
Awesome thread.
Samuel,
Just wanted you to be aware of the work we're doing at TimescaleDB (
http://www.timescale.com/), a time-series database extension for PostgreSQL.
Some of how we might help you:
- automatic partitioning by space (primary key - like country_id, for
instance) and time. This
I'm curious why you have so many partial indexes. Are you trying to make
custom indexes per query? It seems to me you might want to consider making
the indexes general, and remove the redundant ones (that have the same
prefix list of indexed fields).
Secondly your table is 102Gb. Clearly there's a
Okay, so after changing longitude/latitude to float4, and
re-organizing the table a bit, I got the query down from about 8
minutes to 40 seconds.
The details are in the gist comments:
https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121
Now, just need to get performance another 2 ord
> On 19 Apr 2017, at 12:58, Samuel Williams
> wrote:
>
> It's interesting you talk about using multiple indexes. In
> MySQL/MariaDB and derivatives, I've never seen the query planner
> consider using multiple indexes. So, it's possible that Postgres may
> use multiple indexes if it saves time?
Ah right, yeah, it's insert only. So, it's never been vacuumed.
On 20 April 2017 at 01:25, Stephen Frost wrote:
> Greetings,
>
> * Samuel Williams (space.ship.travel...@gmail.com) wrote:
>> Thanks for all the suggestions Stephen.
>>
>> > That explain analyze shows a whole ton of heap fetches. Wh
Greetings,
* Samuel Williams (space.ship.travel...@gmail.com) wrote:
> Thanks for all the suggestions Stephen.
>
> > That explain analyze shows a whole ton of heap fetches. When was the
> last time a VACUUM was run on this table, to build the visibility map?
> Without the visibility map being cu
Thanks for all the suggestions Stephen.
> That explain analyze shows a whole ton of heap fetches. When was the
last time a VACUUM was run on this table, to build the visibility map?
Without the visibility map being current, an Index-Only Scan, as is
happening here, can really suck.
I'm using the
Greetings,
* Samuel Williams (space.ship.travel...@gmail.com) wrote:
> We want the following kinds of query to be fast:
"kinds of query" isn't helpful, you should be reviewing exactly the
queries you care about because statistics and your exact data set and
what the exact query you're running is
Hello Vinny, Samuel,
Perhaps I'm missing something, but I'd be interested in the reasoning
behind this.
For column 'what', it seems you have no index on all values, only
indices with specific values for 'what'.
How does this speed up the search? Will PostgreSQL use those indices,
instead of u
On 2017-04-19 13:25, Martijn Tonies (Upscene Productions) wrote:
Samuel, others,
Perhaps I'm missing something, but I'd be interested in the reasoning
behind this.
For column 'what', it seems you have no index on all values, only
indices with specific values for 'what'.
How does this speed up
Martijn that is a good question. It's because we are only concerned
with a subset of events for this index and this particular query. The
query planner can recognise this and use the index correctly. By doing
this, we reduce the size of the index significantly. In the best case,
where we only wante
use those indices,
instead of using a generic index on 'what' and optionally other columns?
With regards,
Martijn Tonies
Upscene Productions
http://www.upscene.com
-Original Message-
From: Samuel Williams
Sent: Wednesday, April 19, 2017 6:01 AM
To: pgsql-general
Subject: [
On 2017-04-19 09:48, John R Pierce wrote:
On 4/19/2017 12:31 AM, vinny wrote:
Given the number of records, my first thought was either partitioning
or partial-indexes.
The fewer rows are in the index, the quicker it will be to check,
and it's not a lot of work to create separate indexes for lat
Thanks Alban, I appreciate your ideas and thoughts.
I'm a little reluctant to go down the partitioning route as I think
we'll probably end up sharding out horizontally using citus data in
the near future and naive postgres partitioning may hamper that
effort.
It's interesting you talk about using
> On 19 Apr 2017, at 6:01, Samuel Williams
> wrote:
>
> Hi.
>
> We have 400,000,000 records in a table (soon to be 800,000,000), here
> is the schema (\d+)
>
> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121
>
> We want the following kinds of query to be fast:
>
> SELECT D
> Did that 50% performance gain come from just the datatype, or that fact that
> the index became smaller?
How would one measure this?
On 19 April 2017 at 19:48, John R Pierce wrote:
> On 4/19/2017 12:31 AM, vinny wrote:
>>
>> Given the number of records, my first thought was either partitionin
On 4/19/2017 12:31 AM, vinny wrote:
Given the number of records, my first thought was either partitioning
or partial-indexes.
The fewer rows are in the index, the quicker it will be to check,
and it's not a lot of work to create separate indexes for lat/long
ranges or dates.
that only works i
On 2017-04-19 07:04, Samuel Williams wrote:
Thanks John. Yes, you are absolutely right, you want the index to be
bottom heavy so you can cull as much as possible at the top. I'm
familiar with that, once implementing a brute-force sudoku solver, it
has the same principle.
I've been working on thi
Thanks John. Yes, you are absolutely right, you want the index to be
bottom heavy so you can cull as much as possible at the top. I'm
familiar with that, once implementing a brute-force sudoku solver, it
has the same principle.
I've been working on this all afternoon. By reducing the longitude,
la
On 4/18/2017 9:01 PM, Samuel Williams wrote:
We want the following kinds of query to be fast:
SELECT ... AND (latitude > -37.03079375089291 AND latitude <
-36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
175.0805140220076);
I wonder if GIST would work better if you use the
Oh, I've also tried earth distance and ll_to_earth in a GIST index...
it was slower that the BTREE index on a small subset of data in my
tests.
On 19 April 2017 at 16:01, Samuel Williams
wrote:
> Hi.
>
> We have 400,000,000 records in a table (soon to be 800,000,000), here
> is the schema (\d+)
>
Hi.
We have 400,000,000 records in a table (soon to be 800,000,000), here
is the schema (\d+)
https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121
We want the following kinds of query to be fast:
SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE
"user_event"."what" IN
27 matches
Mail list logo