[HACKERS] Index only scans
Hi Heikki, Pgsql-Hackers, Sometime back you have started with "Separate Heap Fetch from Index Scan" which was planned to support partial index only scans. Are you still working on it or do you know someone still working on it? We did some development with Gokul's Index Only Patch and have got good performance numbers which are as follows: Test table constitutes 0.5 billion records with thick index on (id,aid) on three machines {pg_normal , pg_enhanced( PGSQL with thick index feature ), Oracle} each having 16 Gb Ram. Disk I/O obtained using sar. testdb=# \d test Table "public.test" Column | Type | Modifiers ---+--+--- id| integer | startdate | date | enddate | date | charge| double precision | firstname | text | lastname | text | aid | double precision | bid | double precision | Indexes: "taid" THICK btree (id, aid) CLUSTER Index size On oracle 15.20 Gb On Pg-normal14.73 Gb Pg_enhanced 23.17 Gb (16bytes*0.5billion = ~7.6 GB) PFA excel sheet for details. In general we saw fair amount of performance improvement, but one thing that surprises us is that after around 20% tuples updated we found oracle taking more time. Regards, Shrish Purohit |Senior Software Engineer|Persistent Systems shrish_puro...@persistent.co.in |Cell:+91-9850-959-940|Tel:+91(20)302-34493 Innovation in software product design, development and delivery- www.persistentsys.com DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. Thick_index_perf_numbers.xlsx Description: Thick_index_perf_numbers.xlsx -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension of Thick Indexes
Hi All, Some brief information about the thick index patch. The patch adds snapshot (MVCC) information to the indexes to enable them being used independently. With this information, the indexes need not refer to the heap data to check an index key's visibility. Various functions such as IndexTupleSatisfiesMVCC() are uesd to verify if the MVCC satisfies a snapshot. The following data structure is added to the thick index key data structure: struct SnapshotFieldsData { TransactionId t_xmin; TransactionId t_xmax; TransactionId t_cid; uint16 infomask; }; Thus the pg_index catalog is added with flag to indicate thick index. (FormData_pg_index ) is modified accordingly. The IndexScanDescData structure, which stores information about how an index can be scanned, was modified to include: SnapshotIndexParams sindex_params; // params of thick index that can be used for scanning void *opaque; The following data data-structure is used by the database execution engine to scan thick index: typedef struct SnapshotIndexParamsData { bool isIndexOnlyScan; // true if index should be used without accessing the table. bool is_minmax_scan; // true if this the query contains min/max aggregate functions. bool is_count_only_scan; // if the query contains only count(), then true. // If true, then index rows are not cached. bool cache_stack; // If true, remembers path from root to leaf in the stack (a member // var defined below). Used For optimization of updates of a row, which // requires delete and insert. // While deleting a key from index, its stack is tracked, and the same stack // is used for inserting the updated key. CmdType operation; // insert, update, delete, . HeapTuplehtup; // reference to the heap tuple that's being operated ScanKey insertion_skey; /* Insertion Scan Key */ void* stack; // reference to the stack mentioned in comments of cache_stack. TupleTableSlot* slot; // stores heap data structures containing index keys } SnapshotIndexParamsData; IndexScan data structure, which stores information about how an index is scanned, is updated to include following members: Bool indexOnlyScan; Bool is_minmax_scan; Bool is_count_only_scan; The Postgres optimizer figures out various ways to access each relation used by the input SQL statement in set_plain_rel_pathlist() function, which indirectly invokes create_index_path(). Create_index_path() does the following: Identifies index path to access a table, figures out if index covers all the columns used by the SQL statement to initialize index_covers_all_clauses sets the cost of accessing table using index (or just using index only scan) by invoking cost_index() function. Functions index_covers_rel_clauses, index_covers_having_quals, index_covers_target_list are used to set flag index_covers_all_clauses. index_covers_all_clauses flag indicates that index covers all clauses and can be used In IndexOnlyScan mode. --How updates and deletes are performed? find_old_slot(estate) identifies the old_slot information from estates' ExprContexts. ExecUpdate uses oldslot, newslot and function ExecUpdateIndexTuples is used to update the index. ExecUpdateIndexTuples prepares iscan IndexScanDescriptor which is used to scan index. Index is scanded by index_getnext(iscan,ForwardScanDirection).for b-tree index it internall calls _bt_readpage. for update and delete operations _bt_readpage calls SnapshotIndexDeleteTuple which updates mvcc information of the index tuple based on MVCC information includeed in the slot. ExecUpdateIndexTuples then inserts the new index tuple. FormIndexDatum stores the MVCC information along with values []. index_form_tuple is also modified accordingly to store MVCC information. Functions StoreMinimalIndexTuple and GetNextMinimalIndexTuple are used to store index key column values into mintup and to get values from mintup respectively. For countOnlyScans we count same static slot, while for minmax_scan first non-null tuple as the tuple with minium/maximum value. Thanks, Shrish Purohit | Software Engineer | Persistent Systems shrish_puro...@persistent.co.in | Cell: +91 98509 59940 | Tel: +91 (20) 3023 4493 Innovation in software product design, development and delivery- www.persistentsys.com -Original Message- From: Heikki Linnakangas [mailto:heikki.linnakan...@enterprisedb.com] Sent: Friday, March 20, 2009 2:01 PM To: Gokulakannan Somasundaram Cc: Alvaro Herrera; Josh Berkus; Shrish Purohit; Amit Gupta; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Extension of Thick Indexes Gokulakannan Somasundaram wrote: >> It would be helpful to explain how this solves the lack of atomicity of >> visibility data updates, wh
[HACKERS] Expression based index
HI All, I had gone through advantages of visibilitymap along with discussion on thick index. I have a question. when we have information about visibility of the data, does it make sense to propagate values of expression stored in index through various slots of execution tree? It would be helpful when index is created on immutable function or expression. I am adding additional columns in the heap tuple returned by index with the expression values stored in the index. For index_only_scan while projecting the values we will treat expression as normal column. It is simple for queries with single table. But, when we have joins, we will need to modify plan to treat expression as normal column. This will involve modifying plan, intermediate slots. Modification in the plan need to be performed after finalizing plan, as we will be able to identify whether we can treat the expression as value only then. Can there be any other alternative? Thanks, Shrish
[HACKERS] Extension of Thick Indexes
HI All, I had gone through advantages of visibilitymap along with discussion on thick index. I have a question. when we have information about visibility of the data, does it make sense to propagate values of expression stored in index through various slots of execution tree? It would be helpful when index is created on immutable function or expression. I am adding additional columns in the heap tuple returned by index with the expression values stored in the index. For index_only_scan while projecting the values we will treat expression as normal column. It is simple for queries with single table. But, when we have joins, we will need to modify plan to treat expression as normal column. This will involve modifying plan, intermediate slots. Modification in the plan need to be performed after finalizing plan, as we will be able to identify whether we can treat the expression as value only then. Can there be any other alternative? Thanks, Shrish
[HACKERS] how can we propagate fk constraints?
Hi All, continued from discussion mail http://archives.postgresql.org/pgsql-hackers/2009-06/msg00971.php > > >> - Am I reading the patch correctly that you do not end up creating >> indexes on the children tables? That is a big problem! > > > So what expected is when an index is created on Partitioned table it should automatically create indexes on individual partitions with same attributes, - name of the index on partition can be base index name appended with tableoid, - tablespace can be the same that of partition or tablespace specified in create index statement. Similarly we need to propagate foreign keys from partitioned table to individual partitions. so when we add new partition, how to create fkconstraint structure based on pg_constraint tuple? a Get the foreign key constraints from pg_constraint tuples for partitioned table, heap_modify_tuple to store relationid of newly added partition, Insert the new tuple directly into pg_constraint assuming required checks are happened at the time creating constraint on base table. update dependencies OR b call ATAddForeignKeyConstraint with recreated FkConstraint based on pg_constraint tuple. Is there any better way? In both cases I have to create FkConstraint structure, Creating FkConstraints will also be helpful in adding FkTriggers. I am not sure about how to create fkconstraint->pk_attrs and fkconstraint->fk_attrs? probably we can use something similar to decompile_column_index_array. We can get remaing variables from pg_constraint tuple. any suggestions appreciated. Thanks, Shrish