from:"Yuzuko Hosoya"

RE: Problem with default partition pruning

2019-03-14 Thread Yuzuko Hosoya

Hi Thibaut,

Thanks a lot for your test and comments.

> 
> Le 28/02/2019 à 09:26, Imai, Yoshikazu a écrit :
> > Hosoya-san
> >
> > On Wed, Feb 27, 2019 at 6:51 AM, Yuzuko Hosoya wrote:
> >>> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> >>> Sent: Wednesday, February 27, 2019 11:22 AM
> >>>
> >>> Hosoya-san,
> >>>
> >>> On 2019/02/22 17:14, Yuzuko Hosoya wrote:
> >>>> Hi,
> >>>>
> >>>> I found the bug of default partition pruning when executing a range
> >> query.
> >>>> -
> >>>> postgres=# create table test1(id int, val text) partition by range
> >>>> (id); postgres=# create table test1_1 partition of test1 for values
> >>>> from (0) to (100); postgres=# create table test1_2 partition of
> >>>> test1 for values from (150) to (200); postgres=# create table
> >>>> test1_def partition of test1 default;
> >>>>
> >>>> postgres=# explain select * from test1 where id > 0 and id < 30;
> >>>>QUERY PLAN
> >>>> 
> >>>>  Append  (cost=0.00..11.83 rows=59 width=11)
> >>>>->  Seq Scan on test1_1  (cost=0.00..5.00 rows=58 width=11)
> >>>>  Filter: ((id > 0) AND (id < 30))
> >>>>->  Seq Scan on test1_def  (cost=0.00..6.53 rows=1 width=12)
> >>>>  Filter: ((id > 0) AND (id < 30))
> >>>> (5 rows)
> >>>>
> >>>> There is no need to scan the default partition, but it's scanned.
> >>>> -
> >>>>
> >>>> In the current implement, whether the default partition is scanned
> >>>> or not is determined according to each condition of given WHERE
> >>>> clause at get_matching_range_bounds().  In this example,
> >>>> scan_default is set true according to id > 0 because id >= 200
> >>>> matches the default partition.  Similarly, according to id < 30,
> >> scan_default is set true.
> >>>> Then, these results are combined according to AND/OR at
> >> perform_pruning_combine_step().
> >>>> In this case, final result's scan_default is set true.
> >>>>
> >>>> The modifications I made are as follows:
> >>>> - get_matching_range_bounds() determines only offsets of range bounds
> >>>>   according to each condition
> >>>> - These results are combined at perform_pruning_combine_step()
> >>>> - Whether the default partition is scanned or not is determined at
> >>>>   get_matching_partitions()
> >>>>
> >>>> Attached the patch.  Any feedback is greatly appreciated.
> >>> Thank you for reporting.  Can you please add this to March CF in
> >>> Bugs category so as not to lose
> >> track
> >>> of this?
> >>>
> >>> I will try to send review comments soon.
> >>>
> >> Thank you for your reply.  I added this to March CF.
> > I tested with simple use case and I confirmed it works correctly like below.
> >
> > In case using between clause:
> > postgres=# create table test1(id int, val text) partition by range
> > (id); postgres=# create table test1_1 partition of test1 for values
> > from (0) to (100); postgres=# create table test1_2 partition of test1
> > for values from (150) to (200); postgres=# create table test1_def
> > partition of test1 default;
> >
> > [HEAD]
> > postgres=# explain analyze select * from test1 where id between 0 and 50;
> > QUERY PLAN
> > --
> > -
> >  Append  (cost=0.00..58.16 rows=12 width=36) (actual time=0.008..0.008 
> > rows=0 loops=1)
> >->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36) (actual 
> > time=0.005..0.005
> rows=0 loops=1)
> >  Filter: ((id >= 0) AND (id <= 50))
> >->  Seq Scan on test1_def  (cost=0.00..29.05 rows=6 width=36) (actual
> time=0.002..0.002 rows=0 loops=1)
> >  Filter: ((id >= 0) AND (id <= 50))
> >
> >
> > [patched]
> > postgres=# explain analyze select * from test1 where id between 0 and 50;
> >QUERY PLAN
> > ---

RE: Problem with default partition pruning

2019-03-19 Thread Yuzuko Hosoya

Hi Amit-san,

From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
Sent: Monday, March 18, 2019 6:44 PM
 
> Hosoya-san,
> 
> On 2019/03/15 15:05, Yuzuko Hosoya wrote:
> > Indeed, it's problematic.  I also did test and I found that this
> > problem was occurred when any partition didn't match WHERE clauses.
> > So following query didn't work correctly.
> >
> > # explain select * from test1_3 where (id > 0 and id < 30);
> >QUERY PLAN
> > -
> >  Append  (cost=0.00..58.16 rows=12 width=36)
> >->  Seq Scan on test1_3_1  (cost=0.00..29.05 rows=6 width=36)
> >  Filter: ((id > 0) AND (id < 30))
> >->  Seq Scan on test1_3_2  (cost=0.00..29.05 rows=6 width=36)
> >  Filter: ((id > 0) AND (id < 30))
> > (5 rows)
> >
> > I created a new patch to handle this problem, and confirmed the query
> > you mentioned works as expected
> >
> > # explain select * from test1 where (id > 0 and id < 30) or (id > 220 and 
> > id < 230);
> > QUERY PLAN
> > --
> > -  Append  (cost=0.00..70.93 rows=26 width=36)
> >->  Seq Scan on test1_1_1  (cost=0.00..35.40 rows=13 width=36)
> >  Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
> >->  Seq Scan on test1_3_1  (cost=0.00..35.40 rows=13 width=36)
> >  Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id <
> > 230)))
> > (5 rows)
> >
> > v2 patch attached.
> > Could you please check it again?
> 
> I think the updated patch breaks the promise that get_matching_range_bounds 
> won't set scan_default
> based on individual pruning value comparisons.  How about the attached delta 
> patch that applies on
> top of your earlier v1 patch, which fixes the issue reported by Thibaut?
> 
Indeed.  I agreed with your proposal.
Also, I confirmed your patch works correctly.

Best regards,
Yuzuko Hosoya

RE: Problem with default partition pruning

2019-03-21 Thread Yuzuko Hosoya

Hi,

Thanks a lot for additional tests and the new patch.


> Le 20/03/2019 à 10:06, Amit Langote a écrit :
> > Hi Thibaut,
> >
> > On 2019/03/19 23:58, Thibaut Madelaine wrote:
> >> I kept on testing with sub-partitioning.
> > Thanks.
> >
> >> I found a case, using 2 default partitions, where a default partition
> >> is not pruned:
> >>
> >> --
> >>
> >> create table test2(id int, val text) partition by range (id); create
> >> table test2_20_plus_def partition of test2 default; create table
> >> test2_0_20 partition of test2 for values from (0) to (20)
> >>   partition by range (id);
> >> create table test2_0_10 partition of test2_0_20 for values from (0)
> >> to (10); create table test2_10_20_def partition of test2_0_20
> >> default;
> >>
> >> # explain (costs off) select * from test2 where id=5 or id=25;
> >>QUERY PLAN
> >> -
> >>  Append
> >>->  Seq Scan on test2_0_10
> >>  Filter: ((id = 5) OR (id = 25))
> >>->  Seq Scan on test2_10_20_def
> >>  Filter: ((id = 5) OR (id = 25))
> >>->  Seq Scan on test2_20_plus_def
> >>  Filter: ((id = 5) OR (id = 25))
> >> (7 rows)
> >>
> >> --
> >>
> >> I have the same output using Amit's v1-delta.patch or Hosoya's
> >> v2_default_partition_pruning.patch.
> > I think I've figured what may be wrong.
> >
> > Partition pruning step generation code should ignore any arguments of
> > an OR clause that won't be true for a sub-partitioned partition, given
> > its partition constraint.
> >
> > In this case, id = 25 contradicts test2_0_20's partition constraint
> > (which is, a IS NOT NULL AND a >= 0 AND a < 20), so the OR clause
> > should really be simplified to id = 5, ignoring the id = 25 argument.
> > Note that we remove id = 25 only for the considerations of pruning and
> > not from the actual clause that's passed to the final plan, although
> > it wouldn't be a bad idea to try to do that.
> >
> > Attached revised delta patch, which includes the fix described above.
> >
> > Thanks,
> > Amit
> Amit, I tested many cases with nested range sub-partitions... and I did not 
> find any problem with your
> last patch  :-)
> 
> I tried mixing with hash partitions with no problems.
> 
> From the patch, there seems to be less checks than before. I cannot think of 
> a case that can have
> performance impacts.
> 
> Hosoya-san, if you agree with Amit's proposal, do you think you can send a 
> patch unifying your
> default_partition_pruning.patch and Amit's second v1-delta.patch?
>

I understood Amit's proposal.  But I think the issue Thibaut reported would 
occur regardless of whether clauses have OR clauses or not as follows.
I tested a query which should output "One-Time Filter: false".

# explain select * from test2_0_20 where id = 25;
  QUERY PLAN   
-------
 Append  (cost=0.00..25.91 rows=6 width=36)
   ->  Seq Scan on test2_10_20_def  (cost=0.00..25.88 rows=6 width=36)
 Filter: (id = 25)


As Amit described in the previous email, id = 25 contradicts test2_0_20's
partition constraint, so I think this clause should be ignored and we can
also handle this case in the similar way as Amit proposal.

I attached v1-delta-2.patch which fix the above issue.  

What do you think about it?


Best regards,
Yuzuko Hosoya


v1-delta-2.patch
Description: Binary data

RE: Problem with default partition pruning

2019-03-24 Thread Yuzuko Hosoya

Hi,

> 
> Hi,
> 
> On 2019/03/23 2:36, Thibaut Madelaine wrote:
> > I tested your last patch and if I didn't mix up patches on the end of
> > a too long week, I get a problem when querying the sub-sub partition:
> >
> > test=# explain select * from test2_0_10 where id = 25;
> >  QUERY PLAN
> > 
> >  Seq Scan on test2_0_10  (cost=0.00..25.88 rows=6 width=36)
> >Filter: (id = 25)
> > (2 rows)
> 
> The problem here is not really related to partition pruning, but another 
> problem I recently sent an
> email about:
> 
> https://www.postgresql.org/message-id/9813f079-f16b-61c8-9ab7-4363cab28d80%40lab.ntt.co.jp
> 
> The problem in this case is that *constraint exclusion* is not working, 
> because partition constraint
> is not loaded by the planner.  Note that pruning is only used if a query 
> specifies the parent table,
> not a partition.

Thanks for the comments.

I saw that email.  Also, I checked that query Thibaut mentioned worked
correctly with Amit's patch discussed in that thread.


Best regards,
Yuzuko Hosoya

RE: Problem with default partition pruning

2019-04-01 Thread Yuzuko Hosoya

Hi,

> Maybe we should have two patches as we seem to be improving two things:
> 
> 1. Patch to fix problems with default partition pruning originally reported 
> by Hosoya-san
> 
> 2. Patch to determine if a given clause contradicts a sub-partitioned table's 
> partition constraint,
> fixing problems unearthed by Thibaut's tests

I attached the latest patches according to Amit comment.
v3_default_partition_pruning.patch fixes default partition pruning problems
and ignore_contradictory_where_clauses_at_partprune_step.patch fixes
sub-partition problems Thibaut tested.

Best regards,
Yuzuko Hosoya


ignore_contradictory_where_clauses_at_partprune_step.patch
Description: Binary data


v3_default_partition_pruning.patch
Description: Binary data

RE: Problem with default partition pruning

2019-04-03 Thread Yuzuko Hosoya

Amit-san,

Thanks for the comments.

> 
> Thanks for dividing patches that way.
> 
> Would it be a good idea to add some new test cases to these patches, just so 
> it's easily apparent what
> we're changing?
Yes, I agree with you.

> 
> So, we could add the test case presented by Thibaut at the following link to 
> the
> default_partition_pruning.patch:
> 
> https://www.postgresql.org/message-id/a4968068-6401-7a9c-8bd4-6a3bc9164a86%40dalibo.com
>
> And, another reported at the following link to
> ignore_contradictory_where_clauses_at_partprune_step.patch:
> 
> https://www.postgresql.org/message-id/bd03f475-30d4-c4d0-3d7f-d2fbde755971%40dalibo.com
> 
> Actually, it might be possible/better to construct the test queries in 
> partition_prune.sql using the
> existing tables in that script, that is, without defining new tables just for 
> adding the new test cases.
> If not, maybe it's OK to create the new tables too.
>
I see.  I added some test cases to each patch according to tests 
discussed in this thread.

However, I found another problem as follows. This query should 
output "One-Time Filter: false" because rlp3's constraints 
contradict WHERE clause.

-
postgres=# \d+ rlp3
   Partitioned table "public.rlp3"
 Column |   Type| Collation | Nullable | Default | Storage  | Stats 
target | Description 
+---+---+--+-+--+--+-
 b  | character varying |   |  | | extended |   
   | 
 a  | integer   |   |  | | plain|   
   | 
Partition of: rlp FOR VALUES FROM (15) TO (20)
Partition constraint: ((a IS NOT NULL) AND (a >= 15) AND (a < 20))
Partition key: LIST (b varchar_ops)
Partitions: rlp3abcd FOR VALUES IN ('ab', 'cd'),
rlp3efgh FOR VALUES IN ('ef', 'gh'),
rlp3nullxy FOR VALUES IN (NULL, 'xy'),
rlp3_default DEFAULT

postgres=# explain select * from rlp3 where a = 2;
 QUERY PLAN 

 Append  (cost=0.00..103.62 rows=24 width=36)
   ->  Seq Scan on rlp3abcd  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
   ->  Seq Scan on rlp3efgh  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
   ->  Seq Scan on rlp3nullxy  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
   ->  Seq Scan on rlp3_default  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
(9 rows)
-

I think that the place of check contradiction process was wrong 
At ignore_contradictory_where_clauses_at_partprune_step.patch.
So I fixed it.

Attached the latest patches. Please check it again.

Best regards,
Yuzuko Hosoya


v2_ignore_contradictory_where_clauses_at_partprune_step.patch
Description: Binary data


v4_default_partition_pruning.patch
Description: Binary data

RE: Problem with default partition pruning

2019-04-08 Thread Yuzuko Hosoya

Amit-san,

> -Original Message-
> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Sent: Friday, April 05, 2019 6:47 PM
> To: Yuzuko Hosoya ; 'Thibaut' 
> ; 'Imai,
> Yoshikazu' 
> Cc: 'PostgreSQL Hackers' 
> Subject: Re: Problem with default partition pruning
> 
> Hosoya-san,
> 
> 
> On 2019/04/04 13:00, Yuzuko Hosoya wrote:
> > I added some test cases to each patch according to tests discussed in
> > this thread.
> 
> Thanks a lot.
> 
> > However, I found another problem as follows. This query should output
> > "One-Time Filter: false" because rlp3's constraints contradict WHERE
> > clause.
> >
> > -
> > postgres=# \d+ rlp3
> >Partitioned table "public.rlp3"
> >  Column |   Type| Collation | Nullable | Default | Storage  | 
> > Stats target | Description
> >
> +---+---+--+-+--+--+-
> 
> >  b  | character varying |   |  | | extended |   
> >|
> >  a  | integer   |   |  | | plain|   
> >|
> > Partition of: rlp FOR VALUES FROM (15) TO (20) Partition constraint:
> > ((a IS NOT NULL) AND (a >= 15) AND (a < 20)) Partition key: LIST (b
> > varchar_ops)
> > Partitions: rlp3abcd FOR VALUES IN ('ab', 'cd'),
> > rlp3efgh FOR VALUES IN ('ef', 'gh'),
> > rlp3nullxy FOR VALUES IN (NULL, 'xy'),
> > rlp3_default DEFAULT
> >
> > postgres=# explain select * from rlp3 where a = 2;
> >  QUERY PLAN
> > 
> >  Append  (cost=0.00..103.62 rows=24 width=36)
> >->  Seq Scan on rlp3abcd  (cost=0.00..25.88 rows=6 width=36)
> >  Filter: (a = 2)
> >->  Seq Scan on rlp3efgh  (cost=0.00..25.88 rows=6 width=36)
> >  Filter: (a = 2)
> >->  Seq Scan on rlp3nullxy  (cost=0.00..25.88 rows=6 width=36)
> >  Filter: (a = 2)
> >->  Seq Scan on rlp3_default  (cost=0.00..25.88 rows=6 width=36)
> >  Filter: (a = 2)
> > (9 rows)
> > -
> 
> This one too would be solved with the other patch I mentioned to fix
> get_relation_info() to load the partition constraint so that constraint 
> exclusion can use it.
> Partition in the earlier example given by Thibaut is a leaf partition, 
> whereas rlp3 above is a
> sub-partitioned partition, but both are partitions nonetheless.
> 
> Fixing partprune.c like we're doing with the
> v2_ignore_contradictory_where_clauses_at_partprune_step.patch only works for 
> the latter, because only
> partitioned tables visit partprune.c.
> 
> OTOH, the other patch only applies to situations where constraint_exclusion = 
> on.
> 
I see.  I think that following example discussed in this thread before would
also be solved with your patch, not 
v2_ignore_contradictory_where_clauses_at_partprune_step.patch.

postgres=# set constraint_exclusion to on;

postgres=# explain select * from test2_0_20 where id = 25;
QUERY PLAN
--
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)


> > I think that the place of check contradiction process was wrong At
> > ignore_contradictory_where_clauses_at_partprune_step.patch.
> > So I fixed it.
> 
> Thanks.  Patch contains some whitespace noise:
> 
> $ git diff --check
> src/backend/partitioning/partprune.c:790: trailing whitespace.
> + * given its partition constraint, we can ignore it,
> src/backend/partitioning/partprune.c:791: trailing whitespace.
> + * that is not try to pass it to the pruning code.
> src/backend/partitioning/partprune.c:792: trailing whitespace.
> + * We should do that especially to avoid pruning code
> src/backend/partitioning/partprune.c:810: trailing whitespace.
> +
> src/test/regress/sql/partition_prune.sql:87: trailing whitespace.
> +-- where clause contradicts sub-partition's constraint
> 
> Can you please fix it?
> 
Thanks for checking.
I'm attaching the latest patch.

> 
> BTW, now I'm a bit puzzled between whether this case should be fixed by 
> hacking on partprune.c like
> this patch does or whether to work on getting the other patch committed and 
> expect users to set
> constraint_exclusion = on for this to behave as expected.  The original 
> intention of setting
> parti

RE: Problem with default partition pruning

2019-04-09 Thread Yuzuko Hosoya

Horiguchi-san,

Thanks for your comments.

> -Original Message-
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> Sent: Tuesday, April 09, 2019 10:33 AM
> To: hosoya.yuz...@lab.ntt.co.jp
> Cc: langote_amit...@lab.ntt.co.jp; thibaut.madela...@dalibo.com; 
> imai.yoshik...@jp.fujitsu.com;
> pgsql-hackers@lists.postgresql.org
> Subject: Re: Problem with default partition pruning
> 
> Sigh..
> 
> At Tue, 09 Apr 2019 10:28:48 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI
>  wrote in
> <20190409.102848.252476604.horiguchi.kyot...@lab.ntt.co.jp>
> > As the second thought. Partition constraint is not constraint
> > expression so that's fair to apply partqual ignoring
> > constraint_exclusion. The variable is set false to skip useless
> > expression evaluation on all partitions, but partqual should be
> > evaluated just once.  Sorry for my confusion.
> >
> > So still it is wrong that the new code is added in
> > gen_partprune_steps_internal.
> 
> So still it is wrong that the new code is added at the beginning of the loop 
> on clauses in
> gen_partprune_steps_internal.
> 
> >   If partqual results true and the clause
> > is long, the partqual is evaluated uselessly at every recursion.
> >
> > Maybe we should do that when we find that the current clause doesn't
> > match part attributes. Specifically just after the for loop "for (i =
> > 0 ; i < part_scheme->partnattrs; i++)".
>
I think we should check whether WHERE clause contradicts partition
constraint even when the clause matches part attributes.  So I moved
"if (partqual)" block to the beginning of the loop you mentioned. 

I'm attaching the latest version.  Could you please check it again?

Best regards,
Yuzuko Hosoya


v4_ignore_contradictory_where_clauses_at_partprune_step.patch
Description: Binary data

Re: Problem with default partition pruning

2019-04-09 Thread Yuzuko Hosoya

Horiguchi-san,

> -Original Message-
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> Sent: Tuesday, April 09, 2019 5:37 PM
> To: hosoya.yuz...@lab.ntt.co.jp
> Cc: langote_amit...@lab.ntt.co.jp; thibaut.madela...@dalibo.com; 
> imai.yoshik...@jp.fujitsu.com; pgsql-hackers@lists.postgresql.org
> Subject: Re: Problem with default partition pruning
> 
> Hi.
> 
> At Tue, 9 Apr 2019 16:41:47 +0900, "Yuzuko Hosoya" 
>  wrote in 
> <00cf01d4eea7$afa43370$0eec9a50$@lab.ntt.co.jp>
> > > So still it is wrong that the new code is added at the beginning 
> > > of the loop on clauses in gen_partprune_steps_internal.
> > >
> > > >   If partqual results true and the 
> > > > clause is long, the partqual is evaluated uselessly at every recursion.
> > > >
> > > > Maybe we should do that when we find that the current clause 
> > > > doesn't match part attributes. Specifically just after the for 
> > > > loop "for (i =
> > > > 0 ; i < part_scheme->partnattrs; i++)".
> > >
> > I think we should check whether WHERE clause contradicts partition 
> > constraint even when the clause matches part attributes.  So I moved
> 
> Why?  If clauses contains a clause on a partition key, the clause is 
> involved in determination of whether a partition survives or not in 
> ordinary way. Could you show how or on what configuration (tables and
> query) it happens that such a matching clause needs to be checked against 
> partqual?
> 
We found that partition pruning didn't work as expect when we scanned a 
sub-partition using WHERE
clause which contradicts the sub-partition's constraint by Thibaut tests.
The example discussed in this thread as follows.

postgres=# \d+ test2
 Partitioned table "public.test2"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | 
Description 
+-+---+--+-+--+--+-
 id | integer |   |  | | plain|  | 
 val| text|   |  | | extended |  | 
Partition key: RANGE (id)
Partitions: test2_0_20 FOR VALUES FROM (0) TO (20), PARTITIONED,
test2_20_plus_def DEFAULT

postgres=# \d+ test2_0_20
   Partitioned table "public.test2_0_20"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | 
Description 
+-+---+--+-+--+--+-
 id | integer |   |  | | plain|  | 
 val| text|   |  | | extended |  | 
Partition of: test2 FOR VALUES FROM (0) TO (20) Partition constraint: ((id IS 
NOT NULL) AND (id >=
0) AND (id < 20)) Partition key: RANGE (id)
Partitions: test2_0_10 FOR VALUES FROM (0) TO (10),
test2_10_20_def DEFAULT

postgres=# explain (costs off) select * from test2 where id=5 or id=20;
   QUERY PLAN
-
 Append
   ->  Seq Scan on test2_0_10
 Filter: ((id = 5) OR (id = 20))
   ->  Seq Scan on test2_10_20_def
 Filter: ((id = 5) OR (id = 20))
   ->  Seq Scan on test2_20_plus_def
 Filter: ((id = 5) OR (id = 20))
(7 rows)

postgres=# explain (costs off) select * from test2_0_20 where id=25;
 QUERY PLAN  
-
 Seq Scan on test2_10_20_def
   Filter: (id = 25)
(2 rows)

So I think we have to check if WHERE clause contradicts sub-partition's 
constraint regardless of
whether the clause matches part attributes or not.

> The "if (partconstr)" block uselessly runs for every clause in the clause 
> tree other than
BoolExpr.
> If we want do that, isn't just doing predicate_refuted_by(partconstr, 
> clauses, false) sufficient before looping over clauses?
Yes, I tried doing that in the original patch.

> 
> 
> > "if (partqual)" block to the beginning of the loop you mentioned.
> >
> > I'm attaching the latest version.  Could you please check it again?
> 
> regards.
> 
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

Best regards,
Yuzuko Hosoya

Runtime pruning problem

2019-04-16 Thread Yuzuko Hosoya

Hi all,

I found a runtime pruning test case which may be a problem as follows:


create table t1 (id int, dt date) partition by range(dt);
create table t1_1 partition of t1 for values from ('2019-01-01') to 
('2019-04-01');
create table t1_2 partition of t1 for values from ('2019-04-01') to 
('2019-07-01');
create table t1_3 partition of t1 for values from ('2019-07-01') to 
('2019-10-01');
create table t1_4 partition of t1 for values from ('2019-10-01') to 
('2020-01-01');

In this example, current_date is 2019-04-16.

postgres=# explain select * from t1 where dt = current_date + 400;
 QUERY PLAN 

 Append  (cost=0.00..198.42 rows=44 width=8)
   Subplans Removed: 3
   ->  Seq Scan on t1_1  (cost=0.00..49.55 rows=11 width=8)
 Filter: (dt = (CURRENT_DATE + 400))
(4 rows)

postgres=# explain analyze select * from t1 where dt = current_date + 400;
  QUERY PLAN
   
---
 Append  (cost=0.00..198.42 rows=44 width=8) (actual time=0.000..0.001 rows=0 
loops=1)
   Subplans Removed: 3
   ->  Seq Scan on t1_1  (cost=0.00..49.55 rows=11 width=8) (never executed)
 Filter: (dt = (CURRENT_DATE + 400))
 Planning Time: 0.400 ms
 Execution Time: 0.070 ms
(6 rows)


I realized t1_1 was not scanned actually since "never executed" 
was displayed in the plan using EXPLAIN ANALYZE.  But I think 
"One-Time Filter: false" and "Subplans Removed: ALL" or something
like that should be displayed instead.

What do you think?


Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

A typo in partprune.c

2018-11-08 Thread Yuzuko Hosoya

Hi,

I found a typo in the comment of get_matching_range_bounds.
/*
- * get_matching_range_datums
+ * get_matching_range_bounds
  * Determine the offsets of range bounds matching the specified 
values,
  * according to the semantics of the given operator strategy

Here is the patch to fix it.

Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



fix_partprune_typo.patch
Description: Binary data

RE: A typo in partprune.c

2018-11-08 Thread Yuzuko Hosoya

Hi Michael,

> From: Michael Paquier [mailto:mich...@paquier.xyz]
> Sent: Thursday, November 08, 2018 8:17 PM
> 
> On Thu, Nov 08, 2018 at 07:19:14PM +0900, Yuzuko Hosoya wrote:
> > Here is the patch to fix it.
> 
> Thanks, committed.

Thank you.

Yuzuko Hosoya
NTT Open Source Software Center

Problem with default partition pruning

2019-02-22 Thread Yuzuko Hosoya

Hi,

I found the bug of default partition pruning when executing a range query.

-
postgres=# create table test1(id int, val text) partition by range (id); 
postgres=# create table test1_1 partition of test1 for values from (0) to 
(100); 
postgres=# create table test1_2 partition of test1 for values from (150) to 
(200);
postgres=# create table test1_def partition of test1 default; 

postgres=# explain select * from test1 where id > 0 and id < 30;
   QUERY PLAN   

 Append  (cost=0.00..11.83 rows=59 width=11)
   ->  Seq Scan on test1_1  (cost=0.00..5.00 rows=58 width=11)
 Filter: ((id > 0) AND (id < 30))
   ->  Seq Scan on test1_def  (cost=0.00..6.53 rows=1 width=12)
 Filter: ((id > 0) AND (id < 30))
(5 rows)

There is no need to scan the default partition, but it's scanned.
-

In the current implement, whether the default partition is scanned
or not is determined according to each condition of given WHERE
clause at get_matching_range_bounds().  In this example, scan_default
is set true according to id > 0 because id >= 200 matches the default
partition.  Similarly, according to id < 30, scan_default is set true.
Then, these results are combined according to AND/OR at 
perform_pruning_combine_step().
In this case, final result's scan_default is set true.

The modifications I made are as follows:
- get_matching_range_bounds() determines only offsets of range bounds
  according to each condition 
- These results are combined at perform_pruning_combine_step()
- Whether the default partition is scanned or not is determined at 
  get_matching_partitions()

Attached the patch.  Any feedback is greatly appreciated.

Best regards,
---
Yuzuko Hosoya
NTT Open Source Software Center


default_partition_pruning.patch
Description: Binary data

RE: Problem with default partition pruning

2019-02-26 Thread Yuzuko Hosoya

Amit-san,

> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Sent: Wednesday, February 27, 2019 11:22 AM
> 
> Hosoya-san,
> 
> On 2019/02/22 17:14, Yuzuko Hosoya wrote:
> > Hi,
> >
> > I found the bug of default partition pruning when executing a range query.
> >
> > -
> > postgres=# create table test1(id int, val text) partition by range
> > (id); postgres=# create table test1_1 partition of test1 for values
> > from (0) to (100); postgres=# create table test1_2 partition of test1
> > for values from (150) to (200); postgres=# create table test1_def
> > partition of test1 default;
> >
> > postgres=# explain select * from test1 where id > 0 and id < 30;
> >QUERY PLAN
> > 
> >  Append  (cost=0.00..11.83 rows=59 width=11)
> >->  Seq Scan on test1_1  (cost=0.00..5.00 rows=58 width=11)
> >  Filter: ((id > 0) AND (id < 30))
> >->  Seq Scan on test1_def  (cost=0.00..6.53 rows=1 width=12)
> >  Filter: ((id > 0) AND (id < 30))
> > (5 rows)
> >
> > There is no need to scan the default partition, but it's scanned.
> > -
> >
> > In the current implement, whether the default partition is scanned or
> > not is determined according to each condition of given WHERE clause at
> > get_matching_range_bounds().  In this example, scan_default is set
> > true according to id > 0 because id >= 200 matches the default
> > partition.  Similarly, according to id < 30, scan_default is set true.
> > Then, these results are combined according to AND/OR at 
> > perform_pruning_combine_step().
> > In this case, final result's scan_default is set true.
> >
> > The modifications I made are as follows:
> > - get_matching_range_bounds() determines only offsets of range bounds
> >   according to each condition
> > - These results are combined at perform_pruning_combine_step()
> > - Whether the default partition is scanned or not is determined at
> >   get_matching_partitions()
> >
> > Attached the patch.  Any feedback is greatly appreciated.
> 
> Thank you for reporting.  Can you please add this to March CF in Bugs 
> category so as not to lose
track
> of this?
> 
> I will try to send review comments soon.
> 
Thank you for your reply.  I added this to March CF.

Regards,
Yuzuko Hosoya
NTT Open Source Software Center

Improve selectivity estimate for range queries

2018-12-20 Thread Yuzuko Hosoya

Hi,

I found the problem about selectivity estimate for range queries
on master as follows.  This is my test case:

postgres=# CREATE TABLE test(id int, val text);
CREATE TABLE
postgres=# INSERT INTO test SELECT i, 'testval' FROM generate_series(0,2)i;
INSERT 0 3
postgres=# VACUUM analyze test;
VACUUM
postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE id > 0 and id < 1;
  QUERY PLAN

---
 Seq Scan on test  (cost=0.00..613.00 rows=150 width=12) (actual 
time=0.044..21.371 rows=
loops=1)
   Filter: ((id > 0) AND (id < 1))
   Rows Removed by Filter: 20001
 Planning Time: 0.099 ms
 Execution Time: 37.142 ms
(5 rows)

Although the actual number of rows was , the estimated number
of rows was 150.

So I dug to see what happened and thought that the following part
in clauselist_selectivity caused this problem.


/*
  * Now scan the rangequery pair list.
  */
 while (rqlist != NULL)
 {
 RangeQueryClause *rqnext;

 if (rqlist->have_lobound && rqlist->have_hibound)
 {
 /* Successfully matched a pair of range clauses */
 Selectivity s2;

 /*
  * Exact equality to the default value probably means the
  * selectivity function punted.  This is not airtight but should
  * be good enough.
  */
 if (rqlist->hibound == DEFAULT_INEQ_SEL ||
 rqlist->lobound == DEFAULT_INEQ_SEL)
 {
 s2 = DEFAULT_RANGE_INEQ_SEL;
 }
 else
 {
 s2 = rqlist->hibound + rqlist->lobound - 1.0;


In my environment, the selectivity for id > 0 was 0.1,
and the selectivity for id < 1 was 0.1. Then, the
value of rqlist->hibound and rqlist->lobound are set respectively.
On the other hand, DEFAULT_INEQ_SEL is 0..  As a result,
the final selectivity is computed according to DEFAULT_RANGE_INEQ_SEL,
since the condition of the second if statement was satisfied. However,
these selectivities were computed according to the statistics, so the
final selectivity had to be calculated from rqlist->hibound + rqlist->lobound - 
1.0.

My test case might be uncommon, but I think it would cause some problems.
To handle such cases I've thought up of an idea based on a previous commit[1]
which solved a similar problem related to DEFAULT_NUM_DISTINCT.  Just like
this modification, I added a flag which shows whether the selectivity
was calculated according to the statistics or not to clauselist_selectivity,
and used it as a condition of the if statement instead of 
rqlist->hibound == DEFAULT_INEQ_SEL and rqlist->lobound == DEFAULT_INEQ_SEL.
I am afraid that changes were more than I had expected, but we can estimate
selectivity correctly.

Patch attached.  Do you have any thoughts?

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=4c2777d0b733220d9029f78817af8ce

Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


improve_selectivity_estimate_for_range_queries.patch
Description: Binary data

RE: Improve selectivity estimate for range queries

2018-12-21 Thread Yuzuko Hosoya

Hi,

Thanks for the comments.
I attach the v2 patch.

> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> Sent: Friday, December 21, 2018 12:25 PM
> 
> Hello.
> 
> At Thu, 20 Dec 2018 17:21:29 +0900, "Yuzuko Hosoya" 
>  wrote in
> <008701d4983d$02e731c0$08b59540$@lab.ntt.co.jp>
> > In my environment, the selectivity for id > 0 was 0.1,
> > and the selectivity for id < 1 was 0.1. Then, the
> > value of rqlist->hibound and rqlist->lobound are set respectively.
> > On the other hand, DEFAULT_INEQ_SEL is 0..  As a
> > result, the final selectivity is computed according to
> > DEFAULT_RANGE_INEQ_SEL, since the condition of the second if statement
> > was satisfied. However, these selectivities were computed according to
> > the statistics, so the final selectivity had to be calculated from 
> > rqlist->hibound +
rqlist->lobound
> - 1.0.
> > My test case might be uncommon, but I think it would cause some problems.
> 
> Yeah, its known behavior as the comment just above. If in your example query, 
> the table weer very
large
> and had an index it could run faultly an index scan over a fair amount of 
> tuples. But I'm not sure
> how much it matters and your example doesn't show that.
> 
> The behvavior discussed here is introduced by this commit.
> 
> | commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a
> | Author: Tom Lane 
> | Date:   Tue Nov 9 00:34:46 2004 +
> |
> | Use a hopefully-more-reliable method of detecting default selectivity
> | estimates when combining the estimates for a range query.  As pointed 
> out
> | by Miquel van Smoorenburg, the existing check for an impossible combined
> | result would quite possibly fail to detect one default and one 
> non-default
> | input.  It seems better to use the default range query estimate in such
> | cases.  To do so, add a check for an estimate of exactly 
> DEFAULT_INEQ_SEL.
> | This is a bit ugly because it introduces additional coupling between
> | clauselist_selectivity and scalarltsel/scalargtsel, but it's not like
> | there wasn't plenty already...
> 
> > To handle such cases I've thought up of an idea based on a previous
> > commit[1] which solved a similar problem related to
> > DEFAULT_NUM_DISTINCT.  Just like this modification, I added a flag
> > which shows whether the selectivity
> 
> The commit [1] added almost no complexity, but this does. Affects many 
> functions to introduce
tighter
> coupling between operator-selectivity functions and clause selectivity 
> functions.
> isdefault travells alone too long just to bind remote functions. We might 
> need more pricipled way
to
> handle that.
>
Yes, indeed.  But I have not found better way than this approach yet.
 
> > was calculated according to the statistics or not to
> > clauselist_selectivity, and used it as a condition of the if statement
> > instead of
> > rqlist->hibound == DEFAULT_INEQ_SEL and rqlist->lobound == DEFAULT_INEQ_SEL.
> > I am afraid that changes were more than I had expected, but we can
> > estimate selectivity correctly.
> >
> > Patch attached.  Do you have any thoughts?
> 
> I've just run over the patch, but some have comments.
> 
> + if (isdefault)
> + rqelem->lobound_isdefault = true;
> 
> This is taking the disjunction of lobounds ever added, I suppose it should be 
> the last lobounds'
> isdefault.
> 
Indeed.  I fixed it.

> As you see, four other instances of "selec ==/!= DEFAULT_*" exist in 
> mergejoinsel. Don't they lead
> to similar kind of problem?
>
I didn't encounter similar problems, but as you said, such problems would be 
occurred due to the
condition, selec != DEFAULT_INEQ_SEL.  I changed these conditions into 
"!isdefault".

> 
> 
>  Selectivity lobound;/* Selectivity of a var > something clause */
>  Selectivity hibound;/* Selectivity of a var < something clause */
> +boollobound_isdefault;/* lobound is a default selectivity? */
> +boolhibound_isdefault;/* hibound is a default selectivity? */
>  } RangeQueryClause;
> 
> Around the [1], there was a discussion about introducing the notion of 
> uncertainty to selectivity.
> The isdefualt is a kind of uncertainty value indicating '0/100% uncertain'. 
> So my feeling is
saying
> that it's better that Selectivity has certinty component then building a 
> selectivity arithmetics
> involving uncertainty, though I don't have a cle

RE: Improve selectivity estimate for range queries

2019-01-10 Thread Yuzuko Hosoya

Hi,

Thanks for the comments, and I'm sorry for the late reply.

> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Friday, January 11, 2019 7:04 AM
> > Robert Haas  writes:
> > On Fri, Dec 21, 2018 at 11:50 AM Tom Lane  wrote:
> >> A smaller-footprint way to fix the immediate problem might be to
> >> change the values of DEFAULT_INEQ_SEL and friends so that they're
> >> even less likely to be matched by accident.  That is, instead of
> >> 0., use 0.342 or some such.
> 
> > That's not a dumb idea, but it seems pretty unprincipled to me, and to
> > be honest I'm kind of surprised that you're not proposing something
> > cleaner.
> 
> The problem is the invasiveness of such a change (large) vs the benefit (not 
> so large).  The
upthread
> patch attempted to add a separate signaling path, but it was very incomplete 
> --- and yet both I
and
> Horiguchi-san thought it was already too messy.
> 
> Maybe at some point we'll go over to something reasonably principled, like 
> adding confidence
intervals
> to all selectivity estimates.  That would be *really* invasive but perhaps 
> would bring enough
benefit
> to justify itself.  But the current patch is just attempting to fix one 
> extremely narrow pain
point,
> and if that is all it's doing it should have a commensurately small 
> footprint.  So I don't think
the
> submitted patch looks good from a cost/benefit standpoint.
> 
Yes, I agree with you.  Indeed the patch I attached is insufficient in 
cost-effectiveness.
However, I want to solve problems of that real estimates happened to equal to 
the default 
values such as this case, even though it's a narrow pain point.  So I tried 
distinguishing
explicitly between real estimates and otherwise as Robert said.

The idea Tom proposed and Horiguchi-san tried seems reasonable, but I'm 
concerned whether
any range queries really cannot match 0.342 (or some such) by 
accident in any 
environments.  Is the way which Horiguchi-san did enough to prove that?


Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Proposal: Partitioning Advisor for PostgreSQL

2018-05-24 Thread Yuzuko Hosoya

Hello,

I'm Yuzuko Hosoya. This is my first PostgreSQL project participation.

I have been developing partitioning advisor prototype with Julien Rouhaud.
It will be a new feature of HypoPG[1], which is a PostgreSQL extension, and
will help partitioning design tuning.  Currently, HypoPG only supports index
design tuning; it allows users to define hypothetical indexes for real tables 
and
shows resulting queries' plan/cost with EXPLAIN as if they were actually 
constructed.
Since declarative partitioning will be greatly improved in PostgreSQL 11 and 
further
versions, there are emerging needs to support partitioning design tuning. This 
is
why we are working on partitioning advisor.  We plan to release the first 
version 
of partitioning advisor for PostgreSQL 11, and then, improve it for PostgreSQL 
12.


Overview of partitioning advisor
---
- Partitioning advisor allows users to define multiple hypothetical partitioning
  schemes on real tables and real data
- PostgreSQL can show resulting queries' plan/cost with EXPLAIN using 
hypothetical
  partitioning schemes
Users can quickly check how their queries would behave if some tables were 
partitioned, and try different partitioning schemes (for instance, to optimize 
some
queries efficiency Vs. maintenance efficiency).


Partitioning advisor works as follows:

Usage
-
0. Consider this target table, t1
#= CREATE TABLE t1 (a int, b text);
#= INSERT INTO t1 SELECT i, 'test' FROM generate_series(1,299) i ;
#= EXPLAIN SELECT * FROM t1;
  QUERY PLAN
   -
   Seq Scan on t1  (cost=0.00..4.99 rows=299 width=9)
   (1 row)

1. Partition the target table hypothetically
#= SELECT * FROM hypopg_partition_table('t1','partition by range(a)');
   The hypopg_partition_table() defines hypothetical range partitioned table 
't1' 
   by the partition key 'a' and stores these information into backend local 
memory.

2. Create hypothetical partitions
#= SELECT * FROM hypopg_add_partition('t1_1','partition of t1 for values 
from (1) to (100)');
#= SELECT * FROM hypopg_add_partition('t1_2','partition of t1 for values 
from (100) to (300)');
   The hypopg_add_partition() defines hypothetical partitions t1_1 and t1_2 
according 
   to their bounds 'from (1) to (100)' and 'from (100) to (300)' respectively, 
and stores 
   these information into backend local memory.

3. PostgreSQL can show resulting queries' plan/cost with EXPLAIN
#= EXPLAIN SELECT * FROM t1;
QUERY PLAN
   ---
   Append  (cost=0.00..7.49 rows=299 width=9)
 ->  Seq Scan on t1 t1_1  (cost=0.00..1.99 rows=99 width=9)
 ->  Seq Scan on t1 t1_2  (cost=0.00..4.00 rows=200 width=9)
   (3 rows)
  PostgreSQL retrieves hypothetical partitioning schemes from HypoPG.
  And then if the referred table is defined as hypothetical partitioned table, 
  PostgreSQL creates plans using them.

This is a simple example. In addition, it enables us to simulate range/list/hash
partitioning, partition pruning, N-way join and partition-wise 
join/aggregation.  
It is already helpful for users to design partitioning schemes.


Current implementation

We mainly use get_relation_info_hook().  What we do in this hook is to inject
hypothetical partitioning schemes according to user definition.  At first, we do
all processes that are done at expand_inherited_tables().  Specifically, we 
expand root->simple_rte_array and root->parse->rtable, rewrite target 
table's RangeTblEntry as a partitioned table, and create RangeTblEntries and 
AppendRelInfos for all hypothetical partitions.  Besides that, we set 
hypothetical
partition's name into rte->alias->aliasname at this time to display hypothetical
partition's name with EXPLAIN.  And then, we rewrite RelOptInfo as needed.  
Specifically, we add partition information, which is set at 
set_relation_partition_info(),
to hypothetical partitioned tables, and set rel->tuples and rel->pages for 
hypothetical partitions.


However, PostgreSQL isn't designed to have hypothetical tables, so we have 
some problematic blockers for an implementation as follows.  We'd like to 
discuss these topics.

Topics of discussion
-
- Expanding partition's RTE
We have to do all processes which are done at expand_inherited_tables() for 
hypothetical partitions. But, since there are no hooks around here, we use 
get_relation_info_hook() as I mentioned above. In this case, we cannot simulate
update queries correctly, because inheritance_planner() which handles update 
queries is called before get_relation_info_hook().  Therefore, we'd like to see 
if 
we could add a hook at ex

RE: Problem with default partition pruning

RE: Problem with default partition pruning

RE: Problem with default partition pruning

RE: Problem with default partition pruning

RE: Problem with default partition pruning

RE: Problem with default partition pruning

RE: Problem with default partition pruning

RE: Problem with default partition pruning

Re: Problem with default partition pruning

Runtime pruning problem

A typo in partprune.c

RE: A typo in partprune.c

Problem with default partition pruning

RE: Problem with default partition pruning

Improve selectivity estimate for range queries

RE: Improve selectivity estimate for range queries

RE: Improve selectivity estimate for range queries

Proposal: Partitioning Advisor for PostgreSQL

18 matches

Site Navigation

Mail list logo

Footer information