Re: [PERFORM] cluster index on a table

2009-07-17 Thread ph...@apra.asso.fr
Hi all,

>On Wed, Jul 15, 2009 at 10:36 PM, Scott Marlowe  
>wrote:

I'd love to see it.
>

> +1 for index organized tables 
>

>--Scott

+1 also for me...

I am currently working for a large customer who is migrating his main 
application towards PostgreSQL, this application currently using DB2 and RFM-II 
(a RDBMS ued on Bull GCOS 8 mainframes). With both RDBMS, "cluster index" are 
used and data rows are stored taking into account these indexes. The benefits 
are :
- a good performance level, especially for batch chains that more or less 
"scan" a lot of large tables,
- and table reorganisations remain not too frequent (about once a month).
To keep a good performance level with PostgreSQL, I expect that we will need 
more frequent reorganisation operations, with the drawbacks this generates for 
the production schedules. This is one of the very few regressions we need to 
address (or may be the only one).

Despite my currently limited knowledge of the postgres internals, I don't see 
why it should be difficult to simply adapt the logic used to determine the data 
row location at insert time, using something like :
- read the cluster index to find the tid of the row having the key value just 
less than the key value of the row to insert,
- if there is place enough in this same page (due to the use of FILLFACTOR or 
previous row deletion), use it,
- else use the first available place using fsm.
This doesn't change anything on MVCC mechanism, doesn't change index structure 
and management, and doesn't require data row move.
This doesn't not ensure that all rows are allways in the "right" order but if 
the FILLFACTOR are correctly set, most rows are well stored, requiring less 
reorganisation.
But I probably miss something ;-)

Regards. Philippe Beaudoin.


[PERFORM] maintain_cluster_order_v5.patch

2009-10-19 Thread ph...@apra.asso.fr
Hi all,

The current discussion about "Indexes on low cardinality columns" let me 
discover this 
"grouped index tuples" patch (http://community.enterprisedb.com/git/) and its 
associated 
"maintain cluster order" patch 
(http://community.enterprisedb.com/git/maintain_cluster_order_v5.patch)

This last patch seems to cover the TODO item named "Automatically maintain 
clustering on a table". 
As this patch is not so new (2007), I would like to know why it has not been 
yet integrated in a standart version of PG (not well finalized ? not totaly 
sure ? not corresponding to the way the core team would like to address this 
item ?) and if there are good chance to see it committed in a near future.

I currently work for a large customer who is migrating a lot of databases used 
by an application that currently largely takes benefit from well clustered 
tables, especialy for batch processing. The migration brings a lot of benefits. 
In fact, the only regression, compared to the old RDBMS, is the fact that 
tables organisation level decreases more quickly, generating more frequent 
heavy cluster operations. 

So this "maintain cluster order" patch (and may be "git" also) should fill the 
lack. But leaving the way of the "standart PG" is not something very 
attractive...

Regards. 
Philippe Beaudoin.





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintain_cluster_order_v5.patch

2009-10-21 Thread ph...@apra.asso.fr
Hi Jeff,

>> Hi all,
>> 
>> The current discussion about "Indexes on low cardinality columns" let
>> me discover this 
>> "grouped index tuples" patch (http://community.enterprisedb.com/git/)
>> and its associated 
>> "maintain cluster order" patch
>> (http://community.enterprisedb.com/git/maintain_cluster_order_v5.patch)
>> 
>> This last patch seems to cover the TODO item named "Automatically
>> maintain clustering on a table".
>
>The TODO item isn't clear about whether the order should be strictly
>maintained, or whether it should just make an effort to keep the table
>mostly clustered. The patch mentioned above makes an effort, but does
>not guarantee cluster order.
>
You are right, there are 2 different visions : a strictly maintained order or a 
 possibly maintained order.
This later is already a good enhancement as it largely decrease the time 
interval between 2 CLUSTER operations, in particular if the FILLFACTOR is 
properly set. In term of performance, having 99% of rows in the "right" page is 
not realy worse than having totaly optimized storage. 
The only benefit of a strictly maintained order is that there is no need for 
CLUSTER at all, which could be very interesting for very large databases with 
24/24 access constraint.
For our need, the "possibly maintained order" is enough.

>> As this patch is not so new (2007), I would like to know why it has
>> not been yet integrated in a standart version of PG (not well
>> finalized ? not totaly sure ? not corresponding to the way the core
>> team would like to address this item ?) and if there are good chance
>> to see it committed in a near future.
>
>Search the archives on -hackers for discussion. I don't think either of
>these features were rejected, but some of the work and benchmarking have
>not been completed.
OK, I will have a look.
>
>If you can help (either benchmark work or C coding), try reviving the
>features by testing them and merging them with the current tree.
OK, that's the rule of the game in such a community.
I am not a good C writer, but I will see what I could do.

> I recommend reading the discussion first, to see if there are any major
>problems.

>
>Personally, I'd like to see the GIT feature finished as well. When I
>have time, I was planning to take a look into it.
>
>Regards,
>   Jeff Davis



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance