On 10/26/21 8:49 AM, Hayk Manukyan wrote:
> ok. here is the deal if I have the following index with 6 column
>
> CREATE INDEX ON job_nlp_year_scan (job, nlp, year, scan_id, issue_flag,
> sequence);
>
> I need to specify all 6 columns in where clause in order to fully use
> this index.
What do you mean by "fully use this index"? Yes, the query may use just
some of the columns and there will be a bit of overhead, but I doubt
it'll be measurable.
> It will not be efficient in cases when I have 4 condition in where
> clause also I should follow the order of columns.
So, do some experiments and show us what the difference is. Create an
index on the 4 and 6 columns, and measure timings for a query with just
the 4 columns.
> In case of INCLUDE the 3 columns just will be in index but will not be
> structured as index so it will have affect only if In select I will have
> that 6 columns nothing more.
>
> In my case I have table with ~15 columns
> In my application I have to do a lot of queries with following where
> clauses
>
> 1. where job = <something> and nlp = <something> and year = <something>
> and SCAN_ID = <something>
> 2. where job = <something> and nlp = <something> and year = <something>
> and ISSUE_FLAG = <something>
> 3. where job = <something> and nlp = <something> and year = <something>
> and SEQUENCE = <something>
>
> I don't want to index just on job, nlp, year because for each job,
> nlp, year I have approximately 5000-7000 rows ,
> overall table have ~50m rows so it is partitioned by job as well. So if
> I build 3 separate indexes it will be huge resource.
> So I am thinking of having one index which will be job, nlp, year and
> the 4-th layer will be other columns not just included but also in
> B-tree structure.
> To visualize it will be something like this:
> image.png
> The red part is ordinary index with nested b-trees ant the yellow part
> is adaptive part so depends on
> where clause optimizer can decide which direction (leaf, b-tree
> whatever) to chose.
> In this case I will have one index and will manage red part only once
> for all three cases.
> Those it make sense ?
If I get what you propose, you want to have a "top" tree for (job, nlp,
year), which "splits" the data set into subsets of ~5000-7000 rows. And
then for each subset you want a separate "small" trees on each of the
other columns, so in this case three trees.
Well, the problem with this is pretty obvious - each of the small trees
requires separate copies of the leaf pages. And remember, in a btree the
internal pages are usually less than 1% of the index, so this pretty
much triples the size of the index. And if you insert a row into the
index, it has to insert the item pointer into each of the small trees,
likely requiring a separate I/O for each.
So I'd bet this is not any different from just having three separate
indexes - it doesn't save space, doesn't save I/O, nothing.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company