Re: [GENERAL] What limits Postgres performance when the whole database lives in cache?

2016-09-08 Thread Nicolas Grilly
On Thu, Sep 8, 2016 at 2:35 AM, dandl wrote: > I understand that. What I'm trying to get a handle on is the magnitude of > that cost and how it influences other parts of the product, specifically > for Postgres. If the overhead for perfect durability were (say) 10%, few > people would care about

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-09-06 Thread Nicolas Grilly
On Tue, Aug 30, 2016 at 1:10 PM, Nicolas Grilly wrote: > We are developing a multitenant application which is currently based on > MySQL, but we're thinking of migrating to PostgreSQL. > > We rely on clustered indexes to preserve data locality for each tenant. > Primary

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-09-02 Thread Nicolas Grilly
On Thu, Sep 1, 2016 at 3:08 PM, Igor Neyman wrote: > Don’t know about plans to implement clustered indexes in PostgreSQL. > It was discussed on the mailing list in the past. I found an interesting thread dated from 2012 about integrating pg_reorg (the ancestor of pg_repack) in PostgreSQL core:

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-31 Thread Nicolas Grilly
On Thu, Sep 1, 2016 at 12:31 AM, Nicolas Grilly wrote: > In DB2, it seems possible to define a "clustering index" that determines > how rows are physically ordered in the "table space" (the heap). > > The documentation says: "When a table has a clustering in

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-31 Thread Nicolas Grilly
On Tue, Aug 30, 2016 at 8:17 PM, Kenneth Marshall wrote: > We have been using the extension pg_repack to keep a table groomed into > cluster order. With an appropriate FILLFACTOR to keep updates on the same > page, it works well. The issue is that it needs space to rebuild the new > index/table.

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-31 Thread Nicolas Grilly
On Thu, Sep 1, 2016 at 12:05 AM, Ben Chobot wrote: > If what they did 3 years ago is similar to what you are trying to do > today, who cares what they are doing today? (Besides using pg_repack > instead of pg_reorg, of course.) > I'm curious because, in the meantime, Instagram could have stopped

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-31 Thread Nicolas Grilly
On Tue, Aug 30, 2016 at 8:17 PM, Kenneth Marshall wrote: > We have been using the extension pg_repack to keep a table groomed into > cluster order. With an appropriate FILLFACTOR to keep updates on the same > page, it works well. The issue is that it needs space to rebuild the new > index/table.

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-31 Thread Nicolas Grilly
On Wed, Aug 31, 2016 at 6:05 PM, Kenneth Marshall wrote: > We just run it via cron. In our case, we run it once a day, but depending > on > your churn, it could be run once a week or more. > Could you provide some numbers: what is the size of the tables or tables that are repacked? how long does

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-31 Thread Nicolas Grilly
Mike Sofen wrote: > For Nicolas’s situation, that would require 10,000 partitions – not very > useful, and each partition would be very small. > This is exactly my conclusion about using partitions in my situation. > In Postgres, as you mentioned, clustering is a “one time” operation but > only

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-31 Thread Nicolas Grilly
Eduardo Morras wrote: > Check BRIN indexs, they are "designed for handling very large tables in > which certain columns have some natural correlation with their physical > location within the table", I think they fit your needs. Yes, a BRIN index on the tenant ID would be very useful if the row

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-31 Thread Nicolas Grilly
On Tue, Aug 30, 2016 at 8:17 PM, Kenneth Marshall wrote: > We have been using the extension pg_repack to keep a table groomed into > cluster order. With an appropriate FILLFACTOR to keep updates on the same > page, it works well. The issue is that it needs space to rebuild the new > index/table.

Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-30 Thread Nicolas Grilly
On Tue, Aug 30, 2016 at 7:26 PM, Vick Khera wrote: > I'll assume you have an index on the tenant ID. In that case, your > queries will be pretty fast. > > On some instances, we have multi-column indexes starting with the > tenant ID, and those are used very effectively as well. > > I never worry

[GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-30 Thread Nicolas Grilly
rows, but most of the time, only one row will match the tenant ID, other rows belonging to other tenants). A few questions: - Am I missing something? - Am I overestimating the benefit of a clustered index in our case, and the cost of not having one in PostgreSQL? - Is there another technical solu

[GENERAL] Clustered index to preserve data locality in a multitenant application?

2016-08-30 Thread Nicolas Grilly
rows, but most of the time, only one row will match the tenant ID, other rows belonging to other tenants). A few questions: - Am I missing something? - Am I overestimating the benefit of a clustered index in our case, and the cost of not having one in PostgreSQL? - Is there another technical so

Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries

2012-11-16 Thread Nicolas Grilly
ltext internals. On Wed, Mar 7, 2012 at 8:05 PM, Nicolas Grilly wrote: > In a previous discussion thread, Oleg suggested that ts_rank is unable to > use GIN indices: > http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php > > This is the only information I have about this. >

Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries

2012-03-07 Thread Nicolas Grilly
In a previous discussion thread, Oleg suggested that ts_rank is unable to use GIN indices: http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php This is the only information I have about this. On Wed, Mar 7, 2012 at 18:59, Andrey Chursin wrote: > Is there any way to sort by ranking,

Re: [GENERAL] ts_rank seems very slow (140 ranked documents / second on my machine)

2012-03-07 Thread Nicolas Grilly
is stored in TOAST table, which triggers a random read for each ranked document. Cheers, Nicolas Grilly On Wed, Jul 13, 2011 at 18:55, Nicolas Grilly wrote: > The first query ran in 347 seconds; the second one in 374 seconds. > Conclusion: There is no significant overhead in the ts_rank

Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries

2012-03-07 Thread Nicolas Grilly
t; so i need self-written postprocessing of query to replace OR with AND. > > -- > Regards, > Andrey > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -

Re: [GENERAL] ts_rank seems very slow (140 ranked documents / second on my machine)

2011-07-13 Thread Nicolas Grilly
On Wed, Jul 13, 2011 at 17:36, Oleg Bartunov wrote: > I didn't notice, reading 40K tuples in random order takes a long time and > this > is a problem of any database. Can you measure time to read all documents > found ? As you asked, I measured the time required to read all documents. For refere

Re: [GENERAL] ts_rank seems very slow (140 ranked documents / second on my machine)

2011-07-13 Thread Nicolas Grilly
Hi Oleg and all, On Wed, Jul 13, 2011 at 08:16, Oleg Bartunov wrote: > there is problem with estimating of cost scanning gin index in < 9.1 > versions, > so you can set enable_seqscan=off; > or try 9.1 which beta3 now. I re-ran my queries using enable seqscan=off. Now the first query, without t

Re: [GENERAL] ts_rank seems very slow (140 ranked documents / second on my machine)

2011-07-12 Thread Nicolas Grilly
On Tue, Jul 12, 2011 at 22:25, Oleg Bartunov wrote: > I don't see your query uses index :) Yes, I know. :) I ran VACUUM ANALYZE and re-ran the query but the output of EXPLAIN ANALYZE stays exactly the same: no index used. Any idea why? By the way, does ts_rank is supposed to use a GIN index wh

[GENERAL] ts_rank seems very slow (140 ranked documents / second on my machine)

2011-07-12 Thread Nicolas Grilly
chine! I'm afraid this is because ts_rank needs to read document_vector, and because that column is stored in TOAST table, it triggers a random access for each matching row. Am I correct? Is it the expected behavior? Is there a way to reduce the execution time? I use PostgreSQL 8.4

Re: [GENERAL] Why "copy ... from stdio" does not return immediately when reading invalid data?

2011-02-07 Thread Nicolas Grilly
ate. Maybe we can make a special case for the COPY FROM subprotocol and handle errors early, in order to make them available to PQgetResult? Is is feasible in a simple way or is it a bad idea? Regards, Nicolas Grilly On Wed, Feb 2, 2011 at 20:06, John R Pierce wrote: > On 02/02/11 10:20 AM, N

[GENERAL] Why "copy ... from stdio" does not return immediately when reading invalid data?

2011-02-02 Thread Nicolas Grilly
and is there a way to be notified of a potential error before calling PQputCopyEnd? Or do I have to send my data in small chunks (for example batch of 1 rows), issue a PQputCopyEnd, check for errors, and continue with the next chunk? Thanks for your help and advice. Regards, Nicolas Grilly

[GENERAL] Using the copy protocol, is there a way to be notified of a potential error before sending?

2011-02-01 Thread Nicolas Grilly
your help and advice. Regards, Nicolas Grilly