Re: Autovacuum on partitioned table (autoanalyze)

2020-12-02 Thread yuzuko
Hello Alvaro,

Thank you for your comments.

>
> > In second thought about the reason for the "toprel_oid". It is perhaps
> > to avoid "wrongly" propagated values to ancestors after a manual
> > ANALYZE on a partitioned table.  But the same happens after an
> > autoanalyze iteration if some of the ancestors of a leaf relation are
> > analyzed before the leaf relation in a autoanalyze iteration.  That
> > can trigger an unnecessary analyzing for some of the ancestors.
>
> I'm not sure I understand this point.  I think we should only trigger
> this on analyzes of *leaf* partitions, not intermediate partitioned
> relations.  That way you would never get these unnecesary analyzes.
> Am I missing something?
>
> (So with my proposal in the previous email, we would send the list of
> ancestor relations after analyzing a leaf partition.  Whenever we
> analyze a non-leaf, then the list of ancestors is sent as an empty
> list.)
>
The problem Horiguchi-san mentioned is as follows:

create table p1 (i int) partition by range(i);
create table p1_1 partition of p1 for values from (0) to (500)
partition by range(i);
create table p1_1_1 partition of p1_1 for values from (0) to (300);
insert into p1 select generate_series(0,299);

-- After auto analyze (first time)
postgres=# select relname, n_mod_since_analyze, last_autoanalyze from
pg_stat_all_tables where relname in ('p1','p1_1','p1_1_1');
 relname | n_mod_since_analyze |   last_autoanalyze
-+-+---
 p1   |  300 |
 p1_1| 300 |
 p1_1_1  |   0 | 2020-12-02 22:24:18.753574+09
(3 rows)

-- Insert more rows
postgres=# insert into p1 select generate_series(0,199);
postgres=# select relname, n_mod_since_analyze, last_autoanalyze from
pg_stat_all_tables where relname in ('p1','p1_1','p1_1_1');
 relname | n_mod_since_analyze |   last_autoanalyze
-+-+---
 p1  |   300 |
 p1_1| 300 |
 p1_1_1  | 200 | 2020-12-02 22:24:18.753574+09
(3 rows)

-- After auto analyze (second time)
postgres=# select relname, n_mod_since_analyze, last_autoanalyze from
pg_stat_all_tables where relname in ('p1','p1_1','p1_1_1');
relname | n_mod_since_analyze |   last_autoanalyze
-+-+---
 p1  |   0 | 2020-12-02 22:25:18.857248+09
 p1_1| 200 | 2020-12-02 22:25:18.661932+09
 p1_1_1  |   0 | 2020-12-02 22:25:18.792078+09

After 2nd auto analyze, all relations' n_mod_since_analyze should be 0,
but p1_1's is not.  This is because p1_1 was analyzed before p1_1_1.
So p1_1 will be analyzed again in the 3rd auto analyze.
That is propagating changes_since_analyze to *all ancestors* may cause
unnecessary analyzes even if we do this only when analyzing leaf partitions.
So I think we should track ancestors which are not analyzed in the current
iteration as Horiguchi-san proposed.

Regarding your idea:
> typedef struct PgStat_MsgAnalyze
> {
>PgStat_MsgHdr  m_hdr;
>Oidm_databaseid;
>Oidm_tableoid;
>bool   m_autovacuum;
>bool   m_resetcounter;
>TimestampTzm_analyzetime;
>PgStat_Counter m_live_tuples;
>PgStat_Counter m_dead_tuples;
>intm_nancestors;
>Oidm_ancestors[PGSTAT_NUM_ANCESTORENTRIES];
> } PgStat_MsgAnalyze;

I'm not sure but how about storing only ancestors that aren't analyzed
in the current
iteration in m_ancestors[PGSTAT_NUM_ANCESTORENTRIES] ?


> > > > Regarding the counters in pg_stat_all_tables: maybe some of these 
> > > > should be
> > > > null rather than zero ?  Or else you should make an 0001 patch to fully
> > > > implement this view, with all relevant counters, not just 
> > > > n_mod_since_analyze,
> > > > last_*analyze, and *analyze_count.  These are specifically misleading:
> > > >
> > > > last_vacuum |
> > > > last_autovacuum |
> > > > n_ins_since_vacuum  | 0
> > > > vacuum_count| 0
> > > > autovacuum_count| 0
> > > >
> > > I haven't modified this part yet, but you meant that we should set
> > > null to counters
> > > about vacuum because partitioned tables are not vacuumed?
> >
> > Perhaps bacause partitioned tables *cannot* be vacuumed.  I'm not sure
> > what is the best way here.  Showing null seems reasonable but I'm not
> > sure that doesn't break anything.

Re: Autovacuum on partitioned table (autoanalyze)

2020-12-14 Thread yuzuko
Hello Alvaro,

On Thu, Dec 3, 2020 at 10:28 PM Alvaro Herrera  wrote:
>
> Hello Yuzuko,
>
> On 2020-Dec-02, yuzuko wrote:
>
> > The problem Horiguchi-san mentioned is as follows:
> > [explanation]
>
> Hmm, I see.  So the problem is that if some ancestor is analyzed first,
> then analyze of one of its partition will cause a redundant analyze of
> the ancestor, because the number of tuples that is propagated from the
> partition represents a set that had already been included in the
> ancestor's analysis.
>
> If the problem was just that, then I think it would be very simple to
> solve: just make sure to sort the tables to vacuum so that all leaves
> are vacuumed first, and then all ancestors, sorted from the bottom up.
> Problem solved.
>

Indeed.  When discussed with Horiguchi-san before,  He mentioned
the same way:
> So, to propagate the count properly, we need to analyze relations
> leaf-to-root order, or propagate the counter only to anscestors that
> haven't been processed in the current iteration.  It seems a bit too
> complex to sort analyze relations in that order.

but we didn't select it because of its complexity as you also said.

> But I'm not sure that that's the whole story, for two reasons: one, two
> workers can run simultaneously, where one analyzes the partition and the
> other analyzes the ancestor.  Then the order is not guaranteed (and
> each process will get no effect from remembering whether it did that one
> or not).  Second, manual analyzes can occur in any order.
>
> Maybe it's more useful to think about this in terms of rememebering that
> partition P had changed_tuples set to N when we analyzed ancestor A.
> Then, when we analyze partition P, we send the message listing A as
> ancestor; on receipt of that message, we see M+N changed tuples in P,
> but we know that we had already seen N, so we only record M.
>
> I'm not sure how to implement this idea however, since on analyze of
> ancestor A we don't have the list of partitions, so we can't know the N
> for each partition.
>
I thought about it for a while, but I can't come up with how to implement it.
And also I think the other way Horiguchi-san suggested in [1] would be
more simple to solve the problem we are facing.

Attach the new patch based on his patch.  What do you think?

[1] 
https://www.postgresql.org/message-id/20201110.203557.1420746510378864931.horikyota.ntt%40gmail.com

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v12_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Release SPI plans for referential integrity with DISCARD ALL

2021-03-10 Thread yuzuko
Hello,

I thought about this suggestion again.

Amit's patch suggested in the thread [1] can eliminate SPI plans from
INSERT/UPDATE triggers, so our memory pressure issue would be solved.
But as far as I can see that thread, Amit's patch doesn't cover DELETE case.
It is not a common case, but there is a risk of pressing memory
capacity extremely.
Considering that, this suggestion might still have good value as Corey
said before.

I improved a patch according to Peter's following comment :
> but I think the
> solution of dropping all cached plans as part of DISCARD ALL seems a bit
> too extreme of a solution.  In the context of connection pooling,
> getting a new session with pre-cached plans seems like a good thing, and
> this change could potentially have a performance impact without a
> practical way to opt out.

In a new patch, I separated discarding SPI Plan caches for RI from DISCARD ALL
and added a new option "RIPLANS" to DISCARD command to do that.  Also a new
function, ResetRIPlanCache() which clears SPI plan caches is called by
DISCARD ALL
or DISCARD RIPLANS.  The amount of modification is small and this option can be
removed instantly when it is no longer needed, so I think we can use
this patch as
a provisional solution.

Any thoughts?

[1] 
https://www.postgresql.org/message-id/flat/CA%2BHiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY%2Bj-kXYFePQedtSLeg%40mail.gmail.com

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v2_discard_ri_spi_plans.patch
Description: Binary data


Re: Autovacuum on partitioned table (autoanalyze)

2020-11-04 Thread yuzuko
Horiguchi-san,

Thank you for your comments.

On Fri, Oct 23, 2020 at 8:23 PM Kyotaro Horiguchi
 wrote:
>
> Thanks you for the new version.
>
> At Fri, 23 Oct 2020 15:12:51 +0900, yuzuko  wrote in
> > Hello,
> >
> > I reconsidered  a way based on the v5 patch in line with
> > Horiguchi-san's comment.
> >
> > This approach is as follows:
> > - A partitioned table is checked whether it needs analyze like a plain
> >   table in relation_needs_vacanalyze().  To do this, we should store
> >   partitioned table's stats (changes_since_analyze).
> > - Partitioned table's changes_since_analyze is updated when
> >   analyze a leaf partition by propagating its changes_since_analyze.
> >   In the next scheduled analyze time, it is used in the above process.
> >   That is, the partitioned table is analyzed behind leaf partitions.
> > - The propagation process differs between autoanalyze or plain analyze.
> >   In autoanalyze, a leaf partition's changes_since_analyze is propagated
> >   to *all* ancestors.  Whereas, in plain analyze on an inheritance tree,
> >   propagates to ancestors not included the tree to avoid needless counting.
> >
> > Attach the latest patch to this email.
> > Could you check it again please?
>
> +   /*
> +* Get its all ancestors to propagate changes_since_analyze 
> count.
> +* However, when ANALYZE inheritance tree, we get ancestors of
> +* toprel_oid to avoid needless counting.
> +*/
> +   if (!OidIsValid(toprel_oid))
> +   ancestors = 
> get_partition_ancestors(RelationGetRelid(rel));
> +   else
> +   ancestors = get_partition_ancestors(toprel_oid);
>
> This comment doesn't explaining what the code intends but what the
> code does.
>
> The reason for the difference is that if we have a valid toprel_oid,
> we analyze all descendants of the relation this time, and if we
> propagate the number to the descendants of the top relation, the next
> analyze on the relations could happen shortly than expected.
>
I modified this comment according to your advice.

>
> -   msg.m_live_tuples = livetuples;
> -   msg.m_dead_tuples = deadtuples;
> +   msg.m_live_tuples = (rel->rd_rel->relkind == 
> RELKIND_PARTITIONED_TABLE)
> +   ? 0  /* if this is a partitioned table, skip modifying */
> +   : livetuples;
> +   msg.m_dead_tuples = (rel->rd_rel->relkind == 
> RELKIND_PARTITIONED_TABLE)
> +   ? 0 /* if this is a partitioned table, skip modifying */
> +   : deadtuples;
>
> Two successive branching with the same condition looks odd.  And we
> need an explanation of *why* we don't set the values for partitioned
> tables.

I moved this part to the previous block that livetuples and deadtuples are set.
Actually, I think the reason why those counters are set 0 when the given
relation is a partitioned table is that such a table doesn't have any data.
About changes_since_analyze counter, we should support exceptionally
in order to check whether partitioned tables need auto analyze.
I added this description to the comment of this function.

>
> +   foreach(lc, ancestors)
> +   {
> +   Oid parentOid = lfirst_oid(lc);
> +   Relation parentrel;
> +
> +   parentrel = table_open(parentOid, AccessShareLock);
>
> I'm not sure, but all of the ancestors always cannot be a parent (in
> other words, a parent of a parent of mine is not a parent of
> mine). Isn't just rel sufficient?
>
I changed 'parentrel' to 'rel'.

>
>
> -* Report ANALYZE to the stats collector, too.  However, if doing
> -* inherited stats we shouldn't report, because the stats collector 
> only
> -* tracks per-table stats.  Reset the changes_since_analyze counter 
> only
> -* if we analyzed all columns; otherwise, there is still work for
> -* auto-analyze to do.
> +* Report ANALYZE to the stats collector, too.  Reset the
> +* changes_since_analyze counter only if we analyzed all columns;
> +* otherwise, there is still work for auto-analyze to do.
>  */
> -   if (!inh)
> +   if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> pgstat_report_analyze(onerel, totalrows, totaldeadrows,
>
> This still rejects traditional inheritance nonleaf relations. But if
> we remove the description about that completely in the comment above,
> we should support trad

Re: Autovacuum on partitioned table (autoanalyze)

2020-11-04 Thread yuzuko
Hi Justin,

Thank you for your comments.
I attached the latest patch(v11) to the previous email.

>
> +* Get its all ancestors to propagate changes_since_analyze 
> count.
> +* However, when ANALYZE inheritance tree, we get ancestors of
> +* toprel_oid to avoid needless counting.
>
> => I don't understand that comment.
>
I fixed that comment.

> +   /* Find all members of inheritance set taking 
> AccessShareLock */
> +   children = find_all_inheritors(relid, 
> AccessShareLock, NULL);
>
> => Do you know that returns the table itself ?  And in pg14dev, each
> partitioned table has reltuples = -1, not zero...
>
> +   /* Skip foreign partitions */
> +   if (childclass->relkind == 
> RELKIND_FOREIGN_TABLE)
> +   continue;
>
> => Michael's suggrestion is to use RELKIND_HAS_STORAGE to skip both foreign 
> and
> partitioned tables.
>
I overlooked that.  Revised that according to your comments.

> Also, you called SearchSysCacheCopy1, but didn't free the tuple.  I don't 
> think
> you need to copy it anyway - just call ReleaseSysCache().
>
Fixed it.

> Regarding the counters in pg_stat_all_tables: maybe some of these should be
> null rather than zero ?  Or else you should make an 0001 patch to fully
> implement this view, with all relevant counters, not just n_mod_since_analyze,
> last_*analyze, and *analyze_count.  These are specifically misleading:
>
> last_vacuum |
> last_autovacuum |
> n_ins_since_vacuum  | 0
> vacuum_count| 0
> autovacuum_count| 0
>
I haven't modified this part yet, but you meant that we should set
null to counters
about vacuum because partitioned tables are not vacuumed?

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Release SPI plans for referential integrity with DISCARD ALL

2021-01-13 Thread yuzuko
Hello,

We found problem that a huge amount of memory was consumed when
we created a foreign key on a partitioned table including a lots partitions
and accessed them, as discussed in [1].  Kuroda-san's idea proposed in
this thread is reducing cached SPI plans by combining several plans into one.
But we are also considering another option to solve this problem, which
makes users to release cached SPI plans for referential integrity as well as
plain cached plans with DISCARD ALL.  To do that, we added a new
function, RI_DropAllPreparedPlan() which deletes all plans from
ri_query_cache and
modified DISCARD ALL to execute that function.

I tested using a test case yamada-san attached in [2] as follows:
[Before DISCARD ALL]
=# SELECT name, sum(used_bytes) as bytes,
pg_size_pretty(sum(used_bytes)) FROM pg_backend_memory_contexts WHERE
name LIKE 'Cached%' GROUP BY name;
   name   |   bytes   | pg_size_pretty
--+---+
 CachedPlanQuery  |   1326280 | 1295 kB
 CachedPlanSource |   1474616 | 1440 kB
 CachedPlan   | 744009168 | 710 MB
(3 rows)

[After DISCARD ALL]
=# DISCARD ALL;
DISCARD ALL

=# SELECT name, sum(used_bytes) as bytes,
pg_size_pretty(sum(used_bytes)) FROM pg_backend_memory_contexts WHERE
name LIKE 'Cached%' GROUP BY name;
   name   | bytes | pg_size_pretty
--+---+
 CachedPlanQuery  | 10280 | 10 kB
 CachedPlanSource | 14616 | 14 kB
 CachedPlan   | 13168 | 13 kB
(3 rows)

In addition to that, a following case would be solved with this approach:
When many processes are referencing many tables defined foreign key
constraints thoroughly, a huge amount of memory will be consumed
regardless of whether referenced tables are partitioned or not.

Attached the patch.  Any thoughts?


[1] 
https://www.postgresql.org/message-id/flat/cab4b85d-9292-967d-adf2-be0d803c3e23%40nttcom.co.jp_1
[2]https://www.postgresql.org/message-id/cab4b85d-9292-967d-adf2-be0d803c3e23%40nttcom.co.jp_1

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v1_discard_ri_spi_plans.patch
Description: Binary data


Re: Release SPI plans for referential integrity with DISCARD ALL

2021-01-18 Thread yuzuko
Hi Corey,

Thank you for sharing.

> Amit's patch is now available in this thread [1]. I'm curious if it has any 
> effect on your memory pressure issue.
>
I just found that thread.  I'll check the patch.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Release SPI plans for referential integrity with DISCARD ALL

2021-01-24 Thread yuzuko
Hello,

Thank you for your comments.
Following Corey's advice, I applied Amit's patches proposed in this
email [1], and confirmed our memory pressure problem was solved.
So dropping cached plan with DISCARD is not necessary anymore.

[1] 
https://www.postgresql.org/message-id/CA%2BHiwqG1qQuBwApueaUfA855UJ4TiSgFkPF34hQDWx3tOChV5w%40mail.gmail.com

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Autovacuum on partitioned table (autoanalyze)

2021-04-01 Thread yuzuko
Hi Tomas,

Thank you for reviewing the patch.

> Firstly, the patch propagates the changes_since_analyze values from
> do_analyze_rel, i.e. from the worker after it analyzes the relation.
> That may easily lead to cases with unnecessary analyzes - consider a
> partitioned with 4 child relations:
>  [ explanation ]
>
I didn't realize that till now.  Indeed, this approach increments parent's
changes_since_analyze counter according to its leaf partition's counter
when the leaf partition is analyzed, so it will cause unnecessary ANALYZE
on partitioned tables as you described.


> I propose a different approach - instead of propagating the counts in
> do_analyze_rel for individual leaf tables, let's do that in bulk in
> relation_needs_vacanalyze. Before the (existing) first pass over
> pg_class, we can add a new one, propagating counts from leaf tables to
> parents.
>
Thank you for your suggestion.  I think it could solve all the issues
you mentioned.  I modified the patch based on this approach:

- Create a new counter, PgStat_Counter changes_since_analyze_reported,
  to track changes_since_analyze we already propagated to ancestors.
  This is used for internal processing and users may not need to know it.
  So this counter is not displayed at pg_stat_all_tables view for now.

- Create a new function, pgstat_propagate_changes() which propagates
  changes_since_analyze counter to all ancestors and saves
  changes_since_analyze_reported.  This function is called in
  do_autovacuum() before relation_needs_vacanalyze().


> Note: I do have some ideas about how to improve that, I've started a
> separate thread about it [1].
>
I'm also interested in merging children's statistics for partitioned tables
because it will make ANALYZE on inheritance trees more efficient.
So I'll check it later.

> I forgot to mention one additional thing yesterday - I wonder if we need
> to do something similar after a partition is attached/detached. That can
> also change the parent's statistics significantly, so maybe we should
> handle all partition's rows as changes_since_analyze? Not necessarily
> something this patch has to handle, but might be related.
>
Regarding attached/detached partitions,  I think we should update statistics
of partitioned tables according to the new inheritance tree.  The latest patch
hasn't handled this case yet, but I'll give it a try soon.

Attach the v13 patch to this email.  Could you please check it again?

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v13_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Autovacuum on partitioned table (autoanalyze)

2021-04-06 Thread yuzuko
Hello,

Thank you for reviewing.
I'm working on fixing the patch according to the comments.
I'll send it as soon as I can.

> On 2021-04-06 16:56:49 -0400, Alvaro Herrera wrote:
> > I think there is a good reason to treat them the same: pgstat does not
> > have a provision to keep stats both of the table with children, and the
> > table without children.  It can only have one of those.  For
> > partitioning that doesn't matter: since the table-without-children
> > doesn't have anything on its own (no scans, no tuples, no nothing) then
> > we can just use the entry to store the table-with-children data.  But
> > for the inheritance case, the parent can have its own tuples and counts
> > its own scans and so on; so if we change things, we'll overwrite the
> > stats.  Maybe in the long-term we should allow pgstat to differentiate
> > those cases, but that seems not in scope for this patch.
>
> FWIW, I think it shouldn't be too hard to do that once the shared memory
> stats patch goes in (not 14, unfortunately). The hardest part will be to
> avoid exploding the number of interface functions, but I think we can
> figure out a way to deal with that.
>
I've been thinking about traditional inheritance, I realized that we
need additional
handling to support them because unlike declarative partitioning,
parents may have
some rows in the case of traditional inheritance as Alvaro mentioned.
So I think we should support only declarative partitioning in this
patch for now,
but what do you think?  I'm not sure but if we can solve this matter
at low cost by
using the shared memory stats patch, should we wait for the patch?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Autovacuum on partitioned table (autoanalyze)

2021-04-07 Thread yuzuko
Hi,

I fixed the patch according to the following comments.
Attach the latest patch.  It is based on v14 patch Alvaro attached before.

On Mon, Apr 5, 2021 at 4:08 AM Tomas Vondra
 wrote:
>
> On 4/3/21 9:42 PM, Alvaro Herrera wrote:
> > Thanks for the quick rework.  I like this design much better and I think
> > this is pretty close to committable.  Here's a rebased copy with some
> > small cleanups (most notably, avoid calling pgstat_propagate_changes
> > when the partition doesn't have a tabstat entry; also, free the lists
> > that are allocated in a couple of places).
> >
> > I didn't actually verify that it works.
> >
>
> Yeah, this approach seems much simpler, I think. That being said, I
> think there's a couple issues:
>
> 1) I still don't understand why inheritance and declarative partitioning
> are treated differently. Seems unnecessary nad surprising, but maybe
> there's a good reason?
>
As we discussed in this thread,  this patch should handle only declarative
partitioning for now.

>
> 2) pgstat_recv_tabstat
>
> Should it really reset changes_since_analyze_reported in both branches?
> AFAICS if the "found" branch does this
>
> tabentry->changes_since_analyze_reported = 0;
>
> that means we lose the counter any time tabstats are received, no?
> That'd be wrong, because we'd propagate the changes repeatedly.
>
I changed the changes_since_analyze_reported counter not to reset.

>
> 3) pgstat_recv_analyze
>
> Shouldn't it propagate the counters before resetting them? I understand
> that for the just-analyzed relation we can't do better, but why not to
> propagate the counters to parents? (Not necessarily from this place in
> the stat collector, maybe the analyze process should do that.)
>
I realized that we should propagate the counters for manual ANALYZE too.
thanks to the examples you offered in another email.
I fixed that for manual ANALYZE.

>
> 4) pgstat_recv_reportedchanges
>
> I think this needs to be more careful when updating the value - the
> stats collector might have received other messages modifying those
> counters (including e.g. PGSTAT_MTYPE_ANALYZE with a reset), so maybe we
> can get into situation with
>
>   (changes_since_analyze_reported > changes_since_analyze)
>
> if we just blindly increment the value. I'd bet would lead to funny
> stuff. So maybe this should be careful to never exceed this?
>
pgstat_propagate_changes() calls pgstat_report_reportedchanges()
only if (changes_since_analyze_reported < changes_since_analyze).
So I think we won't get into the such situation
>   (changes_since_analyze_reported > changes_since_analyze)
but am I missing something?

> I also realized relation_needs_vacanalyze is not really doing what I
> suggested - it propagates the counts, but does so in the existing loop
> which checks which relations need vacuum/analyze.
>
> That means we may skip the parent table in the *current* round, because
> it'll see the old (not yet updated) counts. It's likely to be processed
> in the next autovacuum round, but that may actually not happen. The
> trouble is the reltuples for the parent is calculated using *current*
> child reltuples values, but we're comparing it to the *old* value of
> changes_since_analyze. So e.g. if enough rows were inserted into the
> partitions, it may still be below the analyze threshold.
>
Indeed, the partitioned table was not analyzed at the same timing as
its leaf partitions due to the delay of propagating counters.  According
to your proposal, I added a separate loop to propagate the counters
before collecting a list of relations to vacuum/analyze.

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v15_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Autovacuum on partitioned table (autoanalyze)

2020-08-16 Thread yuzuko
I'm sorry for the late reply.

> This version seems to fail under Werror which is used in the Travis builds:
>
> autovacuum.c: In function ‘relation_needs_vacanalyze’:
> autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
>anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
>^
> autovacuum.c:2972:9: note: ‘reltuples’ was declared here
>   float4  reltuples;  /* pg_class.reltuples */
>  ^
>

I attach the latest patch that solves the above Werror.
Could you please check it again?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v9_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Problem with default partition pruning

2019-03-05 Thread yuzuko
Imai-san,

Thanks for sharing your tests!

On Thu, Feb 28, 2019 at 5:27 PM Imai, Yoshikazu
 wrote:
>
> Hosoya-san
>
> On Wed, Feb 27, 2019 at 6:51 AM, Yuzuko Hosoya wrote:
> > > From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> > > Sent: Wednesday, February 27, 2019 11:22 AM
> > >
> > > Hosoya-san,
> > >
> > > On 2019/02/22 17:14, Yuzuko Hosoya wrote:
> > > > Hi,
> > > >
> > > > I found the bug of default partition pruning when executing a range
> > query.
> > > >
> > > > -
> > > > postgres=# create table test1(id int, val text) partition by range
> > > > (id); postgres=# create table test1_1 partition of test1 for values
> > > > from (0) to (100); postgres=# create table test1_2 partition of
> > > > test1 for values from (150) to (200); postgres=# create table
> > > > test1_def partition of test1 default;
> > > >
> > > > postgres=# explain select * from test1 where id > 0 and id < 30;
> > > >QUERY PLAN
> > > > 
> > > >  Append  (cost=0.00..11.83 rows=59 width=11)
> > > >->  Seq Scan on test1_1  (cost=0.00..5.00 rows=58 width=11)
> > > >  Filter: ((id > 0) AND (id < 30))
> > > >->  Seq Scan on test1_def  (cost=0.00..6.53 rows=1 width=12)
> > > >  Filter: ((id > 0) AND (id < 30))
> > > > (5 rows)
> > > >
> > > > There is no need to scan the default partition, but it's scanned.
> > > > -
> > > >
> > > > In the current implement, whether the default partition is scanned
> > > > or not is determined according to each condition of given WHERE
> > > > clause at get_matching_range_bounds().  In this example,
> > > > scan_default is set true according to id > 0 because id >= 200
> > > > matches the default partition.  Similarly, according to id < 30,
> > scan_default is set true.
> > > > Then, these results are combined according to AND/OR at
> > perform_pruning_combine_step().
> > > > In this case, final result's scan_default is set true.
> > > >
> > > > The modifications I made are as follows:
> > > > - get_matching_range_bounds() determines only offsets of range bounds
> > > >   according to each condition
> > > > - These results are combined at perform_pruning_combine_step()
> > > > - Whether the default partition is scanned or not is determined at
> > > >   get_matching_partitions()
> > > >
> > > > Attached the patch.  Any feedback is greatly appreciated.
> > >
> > > Thank you for reporting.  Can you please add this to March CF in Bugs
> > > category so as not to lose
> > track
> > > of this?
> > >
> > > I will try to send review comments soon.
> > >
> > Thank you for your reply.  I added this to March CF.
>
> I tested with simple use case and I confirmed it works correctly like below.
>
> In case using between clause:
> postgres=# create table test1(id int, val text) partition by range (id);
> postgres=# create table test1_1 partition of test1 for values from (0) to 
> (100);
> postgres=# create table test1_2 partition of test1 for values from (150) to 
> (200);
> postgres=# create table test1_def partition of test1 default;
>
> [HEAD]
> postgres=# explain analyze select * from test1 where id between 0 and 50;
> QUERY PLAN
> ---
>  Append  (cost=0.00..58.16 rows=12 width=36) (actual time=0.008..0.008 rows=0 
> loops=1)
>->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36) (actual 
> time=0.005..0.005 rows=0 loops=1)
>  Filter: ((id >= 0) AND (id <= 50))
>->  Seq Scan on test1_def  (cost=0.00..29.05 rows=6 width=36) (actual 
> time=0.002..0.002 rows=0 loops=1)
>  Filter: ((id >= 0) AND (id <= 50))
>
>
> [patched]
> postgres=# explain analyze select * from test1 where id between 0 and 50;
>QUERY PLAN
> -
>  Append  (cost=0.00..29.08 rows=6 width=36) (actual time=0.006..0.006 rows=0 
> loops=1)
>->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36) (actu

Re: Problem with default partition pruning

2019-03-05 Thread yuzuko
Hi Ibrar,

On Tue, Mar 5, 2019 at 2:37 AM Ibrar Ahmed  wrote:
>
> Hi  Yuzuko Hosoya,
>
> Ignore my last message, I think this is also a legitimate scan on default 
> partition.
>
Oh, I got it.  Thanks a lot.

>
> On Mon, Mar 4, 2019 at 10:29 PM Ibrar Ahmed  wrote:
>>
>> Hi
>>
>> Patch work fine to me, but I have one test case where default partition 
>> still scanned.
>>
>> postgres=# explain select * from test1 where (id < 10) and true;
>> QUERY PLAN
>> ---
>>  Append  (cost=0.00..55.98 rows=846 width=36)
>>->  Seq Scan on test1_1  (cost=0.00..25.88 rows=423 width=36)
>>  Filter: (id < 10)
>>->  Seq Scan on test1_def  (cost=0.00..25.88 rows=423 width=36)
>>  Filter: (id < 10)
>> (5 rows)
>
>
>
> --
> Ibrar Ahmed

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



Re: Problem with default partition pruning

2019-08-04 Thread yuzuko
Hi Alvaro and Amit,

Thanks for reviewing and fixing the patch.
Also, I confirmed the commit message explained
the modification clearly.   Thanks a lot.

Yuzuko Hosoya

On Mon, Aug 5, 2019 at 12:24 AM Alvaro Herrera  wrote:
>
> On 2019-Aug-04, Alvaro Herrera wrote:
>
> > So this is the best commit messages I could come up with at this stupid
> > hour.  I think the wording is pretty poor but at least it seems correct.
> > I'm not sure I'll be able to get this pushed tomorrow, but I'll try.
>
> Pushed.  Since this is Sunday before minors, I'll be checking buildfarm
> and will summarily revert if anything goes wrong.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Problem with default partition pruning

2019-08-04 Thread yuzuko
Hi Alvaro,

Thanks for reviewing.
The modification you made seems correct to me.

However, I'm still concerned that the block
-
  if (partconstr)
 {
partconstr = (List *)
 expression_planner((Expr *) partconstr);
if (context->rel->relid != 1)
 ChangeVarNodes((Node *) partconstr, 1,
context->rel->relid, 0);
if (predicate_refuted_by(partconstr,
list_make1(clause),
false))
{
   context->contradictory = true;
   return NIL;
}
 }
-
is written in the right place as Amit explained [1].

At first, we tried to fix the following problematic query
which was reported by Thibaut before:

create table p (a int) partition by range (a);
create table p1 partition of p for values from (0) to (20) partition
by range (a);
create table p11 partition of p1 for values from (0) to (10);
create table p1_def partition of p1 default;
explain select * from p1 where a = 25 or a = 5;
  QUERY PLAN
──
 Append  (cost=0.00..96.75 rows=50 width=4)
   ->  Seq Scan on p11  (cost=0.00..48.25 rows=25 width=4)
 Filter: ((a = 25) OR (a = 5))
   ->  Seq Scan on p1_def  (cost=0.00..48.25 rows=25 width=4)
 Filter: ((a = 25) OR (a = 5))
(5 rows)

And Amit proposed the patch to fix this problem[2].
In this patch, the above if() block was written in another place.
After that, I found the following query also doesn't work correctly:

explain select * from p1 where a = 25;
  QUERY PLAN
───
 Append  (cost=0.00..41.94 rows=13 width=4)
   ->  Seq Scan on p1_def  (cost=0.00..41.88 rows=13 width=4)
 Filter: (a = 25)
(3 rows)

So I proposed moving the if() block to the current place.
The latest patch can solve both queries but I found the latter
problem can be solved by setting constraint_exclusion = on.

Which approach will be suitable?


[1] 
https://www.postgresql.org/message-id/CA%2BHiwqG%2BnSD0vcJacArYgYcFVtpTJQ0fx1gBgoZkA_isKd6Z2w%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/9bb31dfe-b0d0-53f3-3ea6-e64b811424cf%40lab.ntt.co.jp

Best regards,

Yuzuko Hosoya
NTT Open Source Software Center

On Mon, Aug 5, 2019 at 11:03 AM Alvaro Herrera  wrote:
>
> I propose the comment rewordings as attached.  Mostly, this updates the
> comment atop the function to cover the case being modified, and then the
> new comment just refers to the new explicitly stated policy, so it
> bcomes simpler.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Problem with default partition pruning

2019-08-06 Thread yuzuko
Hello,

> > Well, if this is really all that duplicative, one thing we could do is
> > run this check in get_partprune_steps_internal only if
> > constraint_exclusion is a value other than on; we should achieve the
> > same effect with no repetition.  Patch for that is attached.  However,
> > if I run the server with constraint_exclusion=on, the regression test
> > fail with the attached diff.  I didn't look at what test is failing, but
> > it seems to me that it's not really duplicative in all cases, only some.
> > Therefore we can't do it.
>
> Right ... One of the failing cases is one that was benefitting from
> constraint_exclusion in the location where we were doing it previously.
>
Thanks for testing.

> I think trying to fix this problem by selectively moving where to apply
> constraint exclusion would be very bug-prone, and hard to detect whether
> we're missing one spot or doing it multiple times in some other cases.
> So I think we shouldn't try.  If this is a real problem, then we should
> add a flag to the reloptinfo and set it when we've done pruning, then
> do nothing if we already did it.  I'm not sure that this is correct, and
> I'm even less sure that it is worth the trouble.
>
Indeed,  we should not do that from the viewpoint of cost-effectiveness.
I think we can ignore the duplicate processing considering it doesn't
happen in all cases.

> In short, I propose to get this done as the patch I posted in
> https://postgr.es/m/20190806133053.GA23706@alvherre.pgsql
>
I agree with your proposal.  Also, I confirmed a default partition was pruned
as expected with your patch.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Problem with default partition pruning

2019-08-08 Thread yuzuko
Hello,

On Thu, Aug 8, 2019 at 5:09 PM Amit Langote  wrote:
>
> Hi Simon,
>
> On Thu, Aug 8, 2019 at 4:54 PM Simon Riggs  wrote:
> > On Wed, 7 Aug 2019 at 21:27, Alvaro Herrera  
> > wrote:
> >> Well, yes, avoiding that is the point of this commit also: we were
> >> scanning some partitions for some queries, after this patch we're
> >> supposed not to.
> >
> >
> > Understood
> >
> > My concern was about the additional execution time caused when there would 
> > be no benefit, especially if the algoithmic cost is O(N) or similar (i.e. 
> > worse than O(k))
>
> Note that we apply constraint exclusion only to the (sub-partitioned)
> parent, not to all partitions, so the cost is not O(N) in the number
> of partitions.  The idea is that if the parent is excluded, all of its
> partitions are.  We normally wouldn't need to use constrain exclusion,
> because partition pruning can handle most cases.  What partition
> pruning can't handle sufficiently well though is the case where a
> clause set that contradicts the partition constraint is specified --
> while all non-default partitions are correctly pruned, the default
> partition is not.  Using constraint exclusion is a workaround for that
> deficiency of the partition pruning logic.
>
Besides that,  the additional code will not be executed if people don't
define any default partition in the latest patch Amit proposed.  So I think
this patch has no effect such as Simon's concern.

I looked at Amit's patches and found it worked correctly.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Autovacuum on partitioned table

2019-12-02 Thread yuzuko
Hello,

Greg reported in [1] before, autovacuum ignores partitioned tables.
That is, even if individual partitions’ statistics are updated, its parent's
statistics are not updated.  This is TODO for declarative partitioning.
As Amit mentioned in [2], a way to make parent's statistics from
partitions' statistics without scanning the partitions would be nice,
but it will need a lot of modifications.  So I tried to fix that using the
current analyze method.

The summary of the attached patch is as follows:
* If the relation is a partitioned table, check its children if they need
  vacuum or analyze.  Children need to do that are added to
  a table list for autovacuuum.  At least one child is added to the list,
  the partitioned table is also added to the list.  Then, autovacuum
  runs on all the tables in the list.
* If the partitioned table has foreign partitions, ignore them.

When the parent has children don't need vacuum/analyze or foreign
partitions, parent's stats are updated scanning the current data of all
children, so old stats and new are mixed within the partition tree.
Is that suitable?  Any thoughts?

[1] 
https://www.postgresql.org/message-id/CAM-w4HMQKC8hw7nB9TW3OV%2BhkB5OUcPtvr_U_EiSOjByoa-e4Q%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CA%2BHiwqEeZQ-H2OVbHZ%3Dn2RNNPF84Hygi1HC-MDwC-VnBjpA1%3DQ%40mail.gmail.com

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v1_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Partitioning versus autovacuum

2019-12-02 Thread yuzuko
Hello Greg,

> At the risk of forking this thread... I think there's actually a
> planner estimation bug here too.
>
I think that is not a bug.  The estimation error occurred there were no
parent's statistics.  We should run analyze on *partitioned table*.

Here is your test case:
create table p (i integer, j integer) partition by list (i);
create table p0 partition of p for values in (0);
create table p1 partition of p for values in (1);
insert into p select 0,generate_series(1,1000);
insert into p select 1,generate_series(1,1000);
analyze p;

explain analyze select * from q join p using (i) where j between 1 and 500;
 QUERY PLAN
-
 Hash Join  (cost=1.02..54.77 rows=500 width=8) (actual
time=0.180..2.960 rows=500 loops=1)
   Hash Cond: (p0.i = q.i)
   ->  Append  (cost=0.00..45.00 rows=1000 width=8) (actual
time=0.033..1.887 rows=1000 loops=1)
 ->  Seq Scan on p0  (cost=0.00..20.00 rows=500 width=8)
(actual time=0.025..0.524 rows=500 loops=1)
   Filter: ((j >= 1) AND (j <= 500))
   Rows Removed by Filter: 500
 ->  Seq Scan on p1  (cost=0.00..20.00 rows=500 width=8)
(actual time=0.014..0.499 rows=500 loops=1)
   Filter: ((j >= 1) AND (j <= 500))
   Rows Removed by Filter: 500
   ->  Hash  (cost=1.01..1.01 rows=1 width=4) (actual
time=0.103..0.104 rows=1 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on q  (cost=0.00..1.01 rows=1 width=4) (actual
time=0.072..0.074 rows=1 loops=1)
 Planning Time: 0.835 ms
 Execution Time: 3.310 ms
(14 rows)

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Autovacuum on partitioned table

2019-12-02 Thread yuzuko
Hi Laurenz,

Thanks for the comments.

On Mon, Dec 2, 2019 at 6:19 PM Laurenz Albe  wrote:
>
> On Mon, 2019-12-02 at 18:02 +0900, yuzuko wrote:
> > Greg reported in [1] before, autovacuum ignores partitioned tables.
> > That is, even if individual partitions’ statistics are updated, its parent's
> > statistics are not updated.  This is TODO for declarative partitioning.
> > As Amit mentioned in [2], a way to make parent's statistics from
> > partitions' statistics without scanning the partitions would be nice,
> > but it will need a lot of modifications.  So I tried to fix that using the
> > current analyze method.
> >
> > The summary of the attached patch is as follows:
> > * If the relation is a partitioned table, check its children if they need
> >   vacuum or analyze.  Children need to do that are added to
> >   a table list for autovacuuum.  At least one child is added to the list,
> >   the partitioned table is also added to the list.  Then, autovacuum
> >   runs on all the tables in the list.
>
> That means that all partitions are vacuumed if only one of them needs it,
> right?  This will result in way more vacuuming than necessary.
>
Autovacuum runs only partitions need vacuum/analyze, so unnecessary
partitions stats are not updated.  However, to make parent's stats,
all children are scanned.  It might be a waste of time.

> Wouldn't it be an option to update the partitioned table's statistics
> whenever one of the partitions is vacuumed?
>
> Yours,
> Laurenz Albe
>

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Autovacuum on partitioned table

2019-12-26 Thread yuzuko
Hi,

As Laurenz commented in this thread, I tried adding option
to update parent's statistics during Autovacuum. To do that,
I propose supporting 'autovacuum_enabled' option already
exists on partitioned tables.

In the attached patch, you can use 'autovacuum_enabled' option
on partitioned table as usual, that is, a default value of this option
is true. So if you don't need autovacuum on a partitioned table,
you have to specify the option:
CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0);

I'm not sure but I wonder if a suitable value as a default of
'autovacuum_enabled' for partitioned tables might be false.
Because autovacuum on *partitioned tables* requires scanning
all children to make partitioned tables' statistics.
But if the default value varies according to the relation,
is it confusing?  Any thoughts?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v2_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Autovacuum on partitioned table

2020-01-28 Thread yuzuko
Hello,

> Besides the complexity of
> getting that infrastructure in place, an important question is whether
> the current system of applying threshold and scale factor to
> changes_since_analyze should be used as-is for inheritance parents
> (partitioned tables), because if users set those parameters similarly
> to for regular tables, autovacuum might analyze partitioned tables
> more than necessary.  We'll either need a different formula, or some
> commentary in the documentation about how partitioned tables might
> need different setting, or maybe both.
>
I'm not sure but I think we need new autovacuum parameters for
partitioned tables (autovacuum, autovacuum_analyze_threshold,
autovacuum_analyze_scale_factor) because whether it's necessary
to run autovacuum on partitioned tables will depend on users.
What do you think?

> How are you going to track changes_since_analyze of partitioned table?
> It's just an idea but we can accumulate changes_since_analyze of
> partitioned table by adding child tables's value after analyzing each
> child table. And compare the partitioned tables value to the threshold
> that is computed by (autovacuum_analyze_threshold  + total rows
> including all child tables * autovacuum_analyze_scale_factor).
>
The idea Sawada-san mentioned is similar to mine.  Also, for tracking
changes_since_analyze, we have to make partitioned table's statistics.
To do that, we can invent a new PgStat_StatPartitionedTabEntry based
on PgStat_StatTabEntry.  Through talking with Amit, I think the new structure
needs the following members:

tableid
changes_since_analyze
analyze_timestamp
analyze_count
autovac_analyze_timestamp
autovac_analyze_count

Vacuum doesn't run on partitioned tables, so I think members related to
(auto) vacuum need not be contained in the structure.

I'm still writing a patch.  I'll send it this week.
-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Autovacuum on partitioned table

2020-03-16 Thread yuzuko
Hello,

Thank you for reviewing.

> > > +  */
> > > + if (IsAutoVacuumWorkerProcess() &&
> > > + rel->rd_rel->relispartition &&
> > > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
> >
> > I'm not sure I understand why we do this only on autovac. Why not all
> > analyzes?
>
> +1.  If there is a reason, it should at least be documented in the
> comment above.
>
When we analyze partitioned table by ANALYZE command,
all inheritors including partitioned table are analyzed
at the same time.  In this case, if we call pgstat_report_partanalyze,
partitioned table's changes_since_analyze is updated
according to the number of analyzed tuples of partitions
as follows.  But I think it should be 0.

\d+ p
   Partitioned table "public.p"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
+-+---+--+-+-+--+-
 i  | integer |   |  | | plain   |  |
Partition key: RANGE (i)
Partitions: p_1 FOR VALUES FROM (0) TO (100),
 p_2 FOR VALUES FROM (100) TO (200)

insert into p select * from generate_series(0,199);
INSERT 0 200

(before analyze)
-[ RECORD 1 ]---+--
relname | p
n_mod_since_analyze | 0
-[ RECORD 2 ]---+--
relname | p_1
n_mod_since_analyze | 100
-[ RECORD 3 ]---+--
relname | p_2
n_mod_since_analyze | 100

(after analyze)
-[ RECORD 1 ]---+--
relname | p
n_mod_since_analyze | 200
-[ RECORD 2 ]---+--
relname | p_1
n_mod_since_analyze | 0
-[ RECORD 3 ]---+--
relname | p_2
n_mod_since_analyze | 0


I think if we analyze partition tree in order from leaf partitions
to root table, this problem can be fixed.
What do you think about it?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Autovacuum on partitioned table

2020-03-17 Thread yuzuko
Hello,

> > > > +  */
> > > > + if (IsAutoVacuumWorkerProcess() &&
> > > > + rel->rd_rel->relispartition &&
> > > > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
> > >
> > > I'm not sure I understand why we do this only on autovac. Why not all
> > > analyzes?
> >
> > +1.  If there is a reason, it should at least be documented in the
> > comment above.
> >
> When we analyze partitioned table by ANALYZE command,
> all inheritors including partitioned table are analyzed
> at the same time.  In this case, if we call pgstat_report_partanalyze,
> partitioned table's changes_since_analyze is updated
> according to the number of analyzed tuples of partitions
> as follows.  But I think it should be 0.
>
> \d+ p
>Partitioned table "public.p"
>  Column |  Type   | Collation | Nullable | Default | Storage | Stats
> target | Description
> +-+---+--+-+-+--+-
>  i  | integer |   |  | | plain   |  |
> Partition key: RANGE (i)
> Partitions: p_1 FOR VALUES FROM (0) TO (100),
>  p_2 FOR VALUES FROM (100) TO (200)
>
> insert into p select * from generate_series(0,199);
> INSERT 0 200
>
> (before analyze)
> -[ RECORD 1 ]---+--
> relname | p
> n_mod_since_analyze | 0
> -[ RECORD 2 ]---+--
> relname | p_1
> n_mod_since_analyze | 100
> -[ RECORD 3 ]---+--
> relname | p_2
> n_mod_since_analyze | 100
>
> (after analyze)
> -[ RECORD 1 ]---+--
> relname | p
> n_mod_since_analyze | 200
> -[ RECORD 2 ]---+--
> relname | p_1
> n_mod_since_analyze | 0
> -[ RECORD 3 ]---+--
> relname | p_2
> n_mod_since_analyze | 0
>
>
> I think if we analyze partition tree in order from leaf partitions
> to root table, this problem can be fixed.
> What do you think about it?
>

Attach the new patch fixes the above problem.  Also, This patch
includes modifications accoring to all comments Alvaro and Amit
mentioned before in this thread.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v6_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-21 Thread yuzuko
Hi,

Thank you for discussing this item.

> I think we should treat ATTACH/
> DETACH/DROP handling as a further feature to be added in a future
> release, not an open item to be fixed in the current one.
>
I agree with your opinion.

> Now, if somebody sees a very trivial way to handle it, let's discuss it,
> but *I* don't see it.
>
I started thinking about the way to handle ATTACH/DETACH/DROP,
but I haven't created patches.  If no one has done it yet, I'll keep working.

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Feedback on table expansion hook (including patch)

2021-05-12 Thread yuzuko
Hello,

> Thank you all for the feedback and insights.
>
> Yes, the intention is to *replace* expand_inherited_rtentry() in the same way 
> planner_hook replaces standard_planner().
>

This patch is really useful.  We are working on developing hypothetical
partitioning as a feature of HypoPG[1][2], but we hit the same problem
as TimescaleDB.  Therefore we would also be thrilled to have that hook.

Hypothetical partitioning allows users to define multiple partitioning
schemes on real tables and real data hypothetically, and shows resulting
queries' plan/cost with EXPLAIN using hypothetical partitioning schemes.
Users can quickly check how their queries would behave if some tables
were partitioned, and try different partitioning schemes.  HypoPG does
table expansion again according to the defined hypothetical partitioning
schemes.  For this purpose, we used get_relation_info hook, but in PG12,
table expansion was moved, so we cannot do that using
get_relation_info hook.  This is exactly the same problem Erik has.
Therefore the proposed hook would allow us to support hypothetical partitioning.


[1] https://github.com/HypoPG/hypopg/tree/REL1_STABLE
[2] https://github.com/HypoPG/hypopg/tree/master
-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Autovacuum (analyze) on partitioned tables for ATTACH/DETACH/DROP commands

2021-06-21 Thread yuzuko
Hello,

While discussing auto analyze on partitioned tables, we recognized that
auto analyze should run on partitioned tables when ATTACH, DETACH
and DROP commands are executed [1].  Partitioned tables are checked
whether they need auto analyze according to their
changes_since_analyze (total number of inserts/updates/deletes on
partitions), but above DDL operations are not counted for now.

To support ATTACH, DETACH and DROP commands, I proposed
the idea as follows:
* I made new configuration parameters,
   autovacuum_analyze_attach_partition,
   autovacuum_analyze_detach_partition and
   autovacuum_analyze_drop_partition to enable/disable this feature.
* When a partition is attached/detached/dropped, pgstat_report_anl_ancestors()
   is called and checks the above configurations.  If ture, the number of
   livetuples of the partition is counted in its ancestor's changed tuples
   in pgstat_recv_anl_ancestors.

Attach the v1 patch.  What do you think?

[1] 
https://www.postgresql.org/message-id/ce5c3f04-fc17-7139-fffc-037f2c981bec%40enterprisedb.com
-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v1_autovacuum_for_attach_detach_drop_commands.patch
Description: Binary data


Re: Autovacuum on partitioned table (autoanalyze)

2020-07-06 Thread yuzuko
> On Wed, Jul 1, 2020 at 6:26 PM Daniel Gustafsson  wrote:
>
> > On 21 Apr 2020, at 18:21, yuzuko  wrote:
>
> > I'll update the patch soon.
>
> Do you have an updated version to submit?  The previous patch no longer 
> applies
> to HEAD, so I'm marking this entry Waiting on Author in the meantime.
>
Thank you for letting me know.
I attach the latest patch applies to HEAD.

I think there are other approaches like Tom's idea that Justin previously
referenced, but this patch works the same way as previous patches.
(tracks updated/inserted/deleted tuples and checks whether the partitioned
tables needs auto-analyze, same as nonpartitioned tables)
Because I wanted to be able to analyze partitioned tables by autovacuum
as a first step, and I think this approach is the simplest way to do it.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v8_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Autovacuum on partitioned table

2020-04-06 Thread yuzuko
Hi Alvaro,
Thank you for your comments.

> I'm confused about some error messages in the regression test when a
> column is mentioned twice, that changed from mentioning the table named
> in the vacuum command, to mentioning the first partition.  Is that
> because you changed an lappend() to lcons()?  I think you do this so
> that the counters accumulate for the topmost parent that will be
> processed at the end.  I'm not sure I like that too much ... I think
> that needs more thought.
>
I couldn't come up with a solution that counts changes_since_analyze
precisely when analyzing partitioned trees by ANALYZE command based on
this approach (update all ancestor's changes_since_analyze according to the
number of analyzed tuples of leaf partitions).

So I tried another approach to run autovacuum on partitioned tables.
In this approach, all ancestors' changed_tuples are updated when commiting
transactions (at AtEOXact_PgStat) according to the number of inserted/updated/
deleted tuples of leaf partitions.

Attach the latest patch.  What do you think?
 --
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v7_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Autovacuum on partitioned table (autoanalyze)

2020-04-16 Thread yuzuko
Hi Justin,

Thank you for commens.

On Tue, Apr 7, 2020 at 12:32 PM Justin Pryzby  wrote:
>
> Not sure if you saw my earlier message ?
>
I'm sorry, I didn't notice for a while.

> I think it ought to be possible to configure this feature such that an
> auto-analyze on any child partition would trigger analyze of the parent.  I
> think that would be important for maintaining accurate stats of the partition
> key column for many cases involving RANGE-partitioned tables, which are likely
> to rely on histogram rather than MCVs.
>
I read your previous email and understand that it would be neccesary to analyze
partitioned tables automatically when any of its children are analyzed.  In my
first patch, auto-analyze on partitioned tables worked like this but there were
some comments about performance of autovacuum, especially when partitioned
tables have a lot of children.

The latest patch lets users set different autovacuum configuration for
each partitioned
tables like this,
  create table p3(i int) partition by range(i) with
   (autovacuum_analyze_scale_factor=0.0005, autovacuum_analyze_threshold=100);
so users can configure those parameters according to partitioning strategies
and other requirements.

So I think this patch can solve problem you mentioned.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Autovacuum on partitioned table (autoanalyze)

2020-04-21 Thread yuzuko
Hello,

On Sat, Apr 18, 2020 at 2:08 PM Justin Pryzby  wrote:
>
> On Fri, Apr 17, 2020 at 10:09:07PM +0900, Amit Langote wrote:
> > On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby  wrote:
> > > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote:
> > > I don't think that adequately allows what's needed.
> ...(paragraph with my typos elided)...
> > > For example, say a new customer has bunch of partitioned tables which each
> > > currently have only one partition (for the current month), and that's 
> > > expected
> > > to grow to at least 20+ partitions (2+ years of history).  How does one 
> > > set the
> > > partitioned table's auto-analyze parameters to analyze whenever any child 
> > > is
> > > analyzed ?  I don't think it should be needed to update it every month 
> > > after
> > > computing sum(child tuples).
> > >
> > > Possibly you could allow that behavior for some special values of the
> > > threshold.  Like if autovacuum_analyze_threshold=-2, then analyze the 
> > > parent
> > > whenever any of its children are analyzed.
> > >
> > > I think that use case and that need would be common, but I'd like to hear 
> > > what
> > > others think.
> >
> > Having to constantly pay attention to whether a parent's
> > analyze_threshold/scale_factor is working as intended would surely be
> > an annoyance, so I tend to agree that we might need more than just the
> > ability to set analyze_threshold/scale_factor on parent tables.
> > However, I think we can at least start with being able to do
> > *something* here. :)  Maybe others think that this shouldn't be
> > considered committable until we figure out a good analyze threshold
> > calculation formula to apply to parent tables.
> >
> > Considering that, how about having, say, a
> > autovacuum_analyze_partition_parent_frequency, with string values
> > 'default', 'partition'? -- 'default' assumes the same formula as
> > regular tables, whereas with 'partition', parent is analyzed as soon
> > as a partition is.
>
> I assume you mean a reloption to be applied only to partitioned tables,
>
> Your "partition" setting would mean that the scale/threshold values would have
> no effect, which seems kind of unfortunate.
>
> I think it should be called something else, and done differently, like maybe:
> autovacuum_analyze_mode = {off,sum,max,...}
>
The above reloption you suggested will be applied all tables?
Users might not use it for partitions, so I think we should add "parent"
to reloption's name, like Amit's suggestion.

> The threshold would be threshold + scale*tuples, as always, but would be
> compared with f(changes) as determined by the relopt.
>
> sum(changes) would do what you called "default", comparing the sum(changes)
> across all partitions to the threshold, which is itself computed using
> sum(reltuples) AS reltuples.
>
> max(changes) would compute max(changes) compared to the threshold, and the
> threshold would be computed separately for each partition's reltuples:
> threshold_N = parent_threshold + parent_scale * part_N_tuples.  If *any*
> partition exceeds that threshold, the partition itself is analyzed.  This
> allows what I want for time-series.  Maybe this would have an alias called
> "any".
>
I may be wrong but I think the fomula,
> threshold_N = parent_threshold + parent_scale * part_N_tuples
would use orginary table's threshold, not parent's.  If it use parent_threshold,
parent might not be analyzed even if its any partition is analyzed when
parent_threshold is larger than normal threshold.  I'm worried that this case
meets requirements for time-series.

> I'm not sure if there's any other useful modes, like avg(changes)?  I guess we
> can add them later if someone thinks of a good use case.
>
> Also, for me, the v7 patch warns:
> |src/backend/postmaster/autovacuum.c:3117:70: warning: ‘reltuples’ may be 
> used uninitialized in this function [-Wmaybe-uninitialized]
> |   vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * 
> reltuples;
> ..which seems to be a false positive, but easily avoided.
>
Thank you for testing the patch.
I got it.  I'll update the patch soon.

>
> This patch includes partitioned tables in pg_stat_*_tables, which is great; I
> complained awhile ago that they were missing [0].  It might be useful if that
> part was split out into a separate 0001 patch (?).
>
If partitioned table's statistics is used for other purposes,  I think
it would be
better to split the patch. Does anyone have any opinion?

---
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Autovacuum on partitioned table

2020-02-19 Thread yuzuko
Hello,

I'm sorry for the delay.
Attach the latest patch based on discussion in this thread.

> > Yeah that is what I meant. In addition, adding partition's
> > changes_since_analyze to its parent needs to be done recursively as
> > the parent table could also be a partitioned table.
>
> That's a good point.  So, changes_since_analyze increments are
> essentially propagated from leaf partitions to all the way up to the
> root table, including any intermediate partitioned tables.  We'll need
> to consider whether we should propagate only one level at a time (from
> bottom of the tree) or update all parents up to the root, every time a
> leaf partition is analyzed.

For multi-level partitioning, all parents' changes_since_analyze will be
updated whenever analyzing a leaf partition in this patch.
Could you please check the patch again?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v3_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Autovacuum on partitioned table

2020-02-20 Thread yuzuko
Hello Amit-san,

Thanks for your comments.

> * White-space noise in the diff (space used where tab is expected);
> please check with git diff --check and fix.
Fixed it.

> * Names changes_tuples, m_changes_tuples should be changed_tuples and
> m_changed_tuples, respectively?
Yes, I modified it.

> * Did you intend to make it so that we now report *all* inherited
> stats to the stats collector, not just those for partitioned tables?
> IOW, do did you intend the new feature to also cover traditional
> inheritance parents? I am talking about the following diff:
>
I modified as follows to apply this feature to only declaretive partitioning.

- if (!inh)
-  pgstat_report_analyze(onerel, totalrows, totaldeadrows,
- (va_cols == NIL));
+ if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ pgstat_report_analyze(onerel, totalrows, totaldeadrows,
+(va_cols == NIL));


> Having read the relevant diffs again, I think this could be done
> without duplicating code too much.  You seem to have added the same
> logic in two places: do_autovacuum() and table_recheck_autovac().
> More importantly, part of the logic of relation_needs_vacanalyze() is
> duplicated in both of the aforementioned places, which I think is
> unnecessary and undesirable if you consider maintainability. I think
> we could just add the logic to compute reltuples for partitioned
> tables at the beginning of relation_needs_vacanalyze() and be done.
>
Yes, indeed.  Partitioned tables don't need to vacuum so I added new
checking process for partitioned tables outside relation_needs_vacanalyze().
However, partitioned tables' tabentry->n_dead_tuples are always 0 so
dovacuum is always false.   So I think that checking both auto vacuum
and analyze for partitioned tables doesn't matter.  I merged v3_amit_delta.patch
into the new patch and found minor bug, partitioned table's reltuples is
overwritten with it's classForm->reltuples, so I fixed it.

Also, I think partitioned tables' changes_since_analyze should be reported
only when Autovacuum process.  So I fixed it too.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v4_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Autovacuum on partitioned table

2020-02-25 Thread yuzuko
Hi,

Thanks for reviewing the patch.

> > We can make it work correctly but I think perhaps we can skip updating
> > statistics values of partitioned tables other than n_mod_since_analyze
> > as the first step. Because if we support also n_live_tup and
> > n_dead_tup, user might get confused that other statistics values such
> > as seq_scan, seq_tup_read however are not supported.
>
> +1, that makes sense.
>
Yes, Indeed.  I modified it not to update statistics other than
n_mod_since_analyze.
Attach the v5 patch.  In this patch, pgstat_report_analyze() always reports 0 as
msg.m_live_tuples and m_dead_tuples when the relation is partitioned.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v5_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: Problem with default partition pruning

2019-06-24 Thread yuzuko
Hello Shawn, Alvaro,

Thank you for testing patches and comments.
Yes, there are two patches:
(1) v4_default_partition_pruning.patch fixes problems with default
partition pruning
and (2) v3_ignore_contradictory_where_clauses_at_partprune_step.patch determines
if a given clause contradicts a sub-partitioned table's partition constraint.
I'll post two patches together next time.

Anyway, I'll rebase two patches to apply on master and fix space.

Regards,
Yuzuko Hosoya

On Mon, Jun 24, 2019 at 12:45 PM Alvaro Herrera
 wrote:
>
> On 2019-Jun-24, shawn wang wrote:
>
> Hello,
>
> > Thank you for your reply.
> > You can see that the mail start time is February 22. So I looked at the
> > latest version at that time. I found that v11.2 was the newest branch at
> > the time. So I tried to merge this patch into the code, and I found that
> > everything worked.
>
> I see -- I only tried master, didn't occur to me to try it against
> REL_11_STABLE.
>
> > So I tested on this branch and got the results.
> > You need to add the v4_default_partition_pruning.patch
> > <https://www.postgresql.org/message-id/attachment/100463/v4_default_partition_pruning.patch>
> > first,
> > and then add the
> > v3_ignore_contradictory_where_clauses_at_partprune_step.patch
> > <https://www.postgresql.org/message-id/attachment/100591/v3_ignore_contradictory_where_clauses_at_partprune_step.patch>
>
> Oh, so there are two patches?  It's easier to keep track if they're
> always posted together.  Anyway, I may have some time to have a look
> tomorrow (Monday).
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Problem with default partition pruning

2019-06-26 Thread yuzuko
Hello,

On Tue, Jun 25, 2019 at 1:45 PM yuzuko  wrote:
>
> Hello Shawn, Alvaro,
>
> Thank you for testing patches and comments.
> Yes, there are two patches:
> (1) v4_default_partition_pruning.patch fixes problems with default
> partition pruning
> and (2) v3_ignore_contradictory_where_clauses_at_partprune_step.patch 
> determines
> if a given clause contradicts a sub-partitioned table's partition constraint.
> I'll post two patches together next time.
>
> Anyway, I'll rebase two patches to apply on master and fix space.
>

Attach the latest patches discussed in this thread.  I rebased the second
patch (v5_ignore_contradictory_where_clauses_at_partprune_step.patch)
on the current master.  Could you please check them again?

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v4_default_partition_pruning.patch
Description: Binary data


v5_ignore_contradictory_where_clauses_at_partprune_step.patch
Description: Binary data


Re: Autovacuum on partitioned table (autoanalyze)

2020-09-17 Thread yuzuko
 Horiguchi-san,

Thank you for reviewing.


On Tue, Sep 15, 2020 at 7:01 PM Kyotaro Horiguchi
 wrote:
>
> At Tue, 25 Aug 2020 14:28:20 +0200, Daniel Gustafsson  wrote 
> in
> > > I attach the latest patch that solves the above Werror.
> > > Could you please check it again?
> >
> > This version now pass the tests in the Travis pipeline as can be seen in the
> > link below, and is ready to be reviewed in the upcoming commitfest:
> >
> >   http://cfbot.cputube.org/yuzuko-hosoya.html
>
> At Mon, 6 Jul 2020 19:35:37 +0900, yuzuko  wrote in
> > I think there are other approaches like Tom's idea that Justin previously
> > referenced, but this patch works the same way as previous patches.
> > (tracks updated/inserted/deleted tuples and checks whether the partitioned
> > tables needs auto-analyze, same as nonpartitioned tables)
> > Because I wanted to be able to analyze partitioned tables by autovacuum
> > as a first step, and I think this approach is the simplest way to do it.
>
> I'm not sure if anything bad happen if parent and children are not
> agree on statistics.
>
> The requirement suggested here seems to be:
>
> - We want to update parent's stats when any of its children gets its
>   stats updated. This is curucial especially for time-series
>   partitioning.
>
> - However, we don't want analyze the whole-tree every time any of the
>   children was analyzed.
>
> To achieve the both, stats-merging seems to the optimal solution.
>
> Putting that aside, I had a brief look on the latest patch.
>
> /* We only count stats for things that have storage */
> -   if (!RELKIND_HAS_STORAGE(relkind))
> +   if (!RELKIND_HAS_STORAGE(relkind) ||
> +   relkind == RELKIND_PARTITIONED_TABLE)
> {
> rel->pgstat_info = NULL;
>
> RELKIND_HAS_STORAGE(RELKIND_PARTITIONED_TABLE) is already false.
> Maybe you wanted to do "&& relkind !=" instead:p
>
Oh, indeed.  I'll fix it.

>
> +   /*
> +* If this relation is partitioned, we store all ancestors' 
> oid
> +* to propagate its changed_tuples to their parents when this
> +* transaction is committed.
> +*/
> +   if (rel->rd_rel->relispartition && pgstat_info->ancestors == 
> NULL)
>
> If the relation was detached then attached to another partition within
> a transaction, the ancestor list would get stale and the succeeding
> modification to the relation propagates into wrong ancestors.
>
> I think vacuum time is more appropriate to modify ancestors stats. It
> seems to me that what Alvalo pointed isthe list-order-susceptible
> manner of collecting children's modified tuples.
>
I proposed a patch that modified ancestors stats when vacuuming previously.
In that time, having been pointed out by Alvaro and Amit, I tried to update the
parents' changes_since_analyze in every ANALYZE.  However, in that case,
the problem mentioned in [1] occurred, but I could not find a way to avoid it.
I think that it can be solved by updating the parents' changes_since_analyze
only in the case of auto analyze, but what do you think?

>
> +   ? 0  /* partitioned tables don't have any data, so it's 0 */
>
> If the comment is true, we shouldn't have non-zero t_changed_tuples,
> too. I think the reason for the lines is something different.
>
Yes, surely.  I think updating the values of live_tuples and dead_tuples
is confusing for users.  I'll consider another comment.


[1] 
https://www.postgresql.org/message-id/CAKkQ50-bwFEDMBGb1JmDXffXsiU8xk-hN6kJK9CKjdBa7r%3DHdw%40mail.gmail.com
--
Best regards,
Yuzuko Hosoya


NTT Open Source Software Center




Re: Autovacuum on partitioned table (autoanalyze)

2020-10-22 Thread yuzuko
Hello,

I reconsidered  a way based on the v5 patch in line with
Horiguchi-san's comment.

This approach is as follows:
- A partitioned table is checked whether it needs analyze like a plain
  table in relation_needs_vacanalyze().  To do this, we should store
  partitioned table's stats (changes_since_analyze).
- Partitioned table's changes_since_analyze is updated when
  analyze a leaf partition by propagating its changes_since_analyze.
  In the next scheduled analyze time, it is used in the above process.
  That is, the partitioned table is analyzed behind leaf partitions.
- The propagation process differs between autoanalyze or plain analyze.
  In autoanalyze, a leaf partition's changes_since_analyze is propagated
  to *all* ancestors.  Whereas, in plain analyze on an inheritance tree,
  propagates to ancestors not included the tree to avoid needless counting.

Attach the latest patch to this email.
Could you check it again please?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v10_autovacuum_on_partitioned_table.patch
Description: Binary data


RE: Problem with default partition pruning

2019-03-14 Thread Yuzuko Hosoya
Hi Thibaut,

Thanks a lot for your test and comments.

> 
> Le 28/02/2019 à 09:26, Imai, Yoshikazu a écrit :
> > Hosoya-san
> >
> > On Wed, Feb 27, 2019 at 6:51 AM, Yuzuko Hosoya wrote:
> >>> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> >>> Sent: Wednesday, February 27, 2019 11:22 AM
> >>>
> >>> Hosoya-san,
> >>>
> >>> On 2019/02/22 17:14, Yuzuko Hosoya wrote:
> >>>> Hi,
> >>>>
> >>>> I found the bug of default partition pruning when executing a range
> >> query.
> >>>> -
> >>>> postgres=# create table test1(id int, val text) partition by range
> >>>> (id); postgres=# create table test1_1 partition of test1 for values
> >>>> from (0) to (100); postgres=# create table test1_2 partition of
> >>>> test1 for values from (150) to (200); postgres=# create table
> >>>> test1_def partition of test1 default;
> >>>>
> >>>> postgres=# explain select * from test1 where id > 0 and id < 30;
> >>>>QUERY PLAN
> >>>> 
> >>>>  Append  (cost=0.00..11.83 rows=59 width=11)
> >>>>->  Seq Scan on test1_1  (cost=0.00..5.00 rows=58 width=11)
> >>>>  Filter: ((id > 0) AND (id < 30))
> >>>>->  Seq Scan on test1_def  (cost=0.00..6.53 rows=1 width=12)
> >>>>  Filter: ((id > 0) AND (id < 30))
> >>>> (5 rows)
> >>>>
> >>>> There is no need to scan the default partition, but it's scanned.
> >>>> -
> >>>>
> >>>> In the current implement, whether the default partition is scanned
> >>>> or not is determined according to each condition of given WHERE
> >>>> clause at get_matching_range_bounds().  In this example,
> >>>> scan_default is set true according to id > 0 because id >= 200
> >>>> matches the default partition.  Similarly, according to id < 30,
> >> scan_default is set true.
> >>>> Then, these results are combined according to AND/OR at
> >> perform_pruning_combine_step().
> >>>> In this case, final result's scan_default is set true.
> >>>>
> >>>> The modifications I made are as follows:
> >>>> - get_matching_range_bounds() determines only offsets of range bounds
> >>>>   according to each condition
> >>>> - These results are combined at perform_pruning_combine_step()
> >>>> - Whether the default partition is scanned or not is determined at
> >>>>   get_matching_partitions()
> >>>>
> >>>> Attached the patch.  Any feedback is greatly appreciated.
> >>> Thank you for reporting.  Can you please add this to March CF in
> >>> Bugs category so as not to lose
> >> track
> >>> of this?
> >>>
> >>> I will try to send review comments soon.
> >>>
> >> Thank you for your reply.  I added this to March CF.
> > I tested with simple use case and I confirmed it works correctly like below.
> >
> > In case using between clause:
> > postgres=# create table test1(id int, val text) partition by range
> > (id); postgres=# create table test1_1 partition of test1 for values
> > from (0) to (100); postgres=# create table test1_2 partition of test1
> > for values from (150) to (200); postgres=# create table test1_def
> > partition of test1 default;
> >
> > [HEAD]
> > postgres=# explain analyze select * from test1 where id between 0 and 50;
> > QUERY PLAN
> > --
> > -
> >  Append  (cost=0.00..58.16 rows=12 width=36) (actual time=0.008..0.008 
> > rows=0 loops=1)
> >->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36) (actual 
> > time=0.005..0.005
> rows=0 loops=1)
> >  Filter: ((id >= 0) AND (id <= 50))
> >->  Seq Scan on test1_def  (cost=0.00..29.05 rows=6 width=36) (actual
> time=0.002..0.002 rows=0 loops=1)
> >  Filter: ((id >= 0) AND (id <= 50))
> >
> >
> > [patched]
> > postgres=# explain analyze select * from test1 where id between 0 and 50;
> >QUERY PLAN
> > ---

RE: Problem with default partition pruning

2019-03-19 Thread Yuzuko Hosoya
Hi Amit-san,

From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
Sent: Monday, March 18, 2019 6:44 PM
 
> Hosoya-san,
> 
> On 2019/03/15 15:05, Yuzuko Hosoya wrote:
> > Indeed, it's problematic.  I also did test and I found that this
> > problem was occurred when any partition didn't match WHERE clauses.
> > So following query didn't work correctly.
> >
> > # explain select * from test1_3 where (id > 0 and id < 30);
> >QUERY PLAN
> > -
> >  Append  (cost=0.00..58.16 rows=12 width=36)
> >->  Seq Scan on test1_3_1  (cost=0.00..29.05 rows=6 width=36)
> >  Filter: ((id > 0) AND (id < 30))
> >->  Seq Scan on test1_3_2  (cost=0.00..29.05 rows=6 width=36)
> >  Filter: ((id > 0) AND (id < 30))
> > (5 rows)
> >
> > I created a new patch to handle this problem, and confirmed the query
> > you mentioned works as expected
> >
> > # explain select * from test1 where (id > 0 and id < 30) or (id > 220 and 
> > id < 230);
> > QUERY PLAN
> > --
> > -  Append  (cost=0.00..70.93 rows=26 width=36)
> >->  Seq Scan on test1_1_1  (cost=0.00..35.40 rows=13 width=36)
> >  Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
> >->  Seq Scan on test1_3_1  (cost=0.00..35.40 rows=13 width=36)
> >  Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id <
> > 230)))
> > (5 rows)
> >
> > v2 patch attached.
> > Could you please check it again?
> 
> I think the updated patch breaks the promise that get_matching_range_bounds 
> won't set scan_default
> based on individual pruning value comparisons.  How about the attached delta 
> patch that applies on
> top of your earlier v1 patch, which fixes the issue reported by Thibaut?
> 
Indeed.  I agreed with your proposal.
Also, I confirmed your patch works correctly.

Best regards,
Yuzuko Hosoya





RE: Problem with default partition pruning

2019-03-21 Thread Yuzuko Hosoya
Hi,

Thanks a lot for additional tests and the new patch.


> Le 20/03/2019 à 10:06, Amit Langote a écrit :
> > Hi Thibaut,
> >
> > On 2019/03/19 23:58, Thibaut Madelaine wrote:
> >> I kept on testing with sub-partitioning.
> > Thanks.
> >
> >> I found a case, using 2 default partitions, where a default partition
> >> is not pruned:
> >>
> >> --
> >>
> >> create table test2(id int, val text) partition by range (id); create
> >> table test2_20_plus_def partition of test2 default; create table
> >> test2_0_20 partition of test2 for values from (0) to (20)
> >>   partition by range (id);
> >> create table test2_0_10 partition of test2_0_20 for values from (0)
> >> to (10); create table test2_10_20_def partition of test2_0_20
> >> default;
> >>
> >> # explain (costs off) select * from test2 where id=5 or id=25;
> >>QUERY PLAN
> >> -
> >>  Append
> >>->  Seq Scan on test2_0_10
> >>  Filter: ((id = 5) OR (id = 25))
> >>->  Seq Scan on test2_10_20_def
> >>  Filter: ((id = 5) OR (id = 25))
> >>->  Seq Scan on test2_20_plus_def
> >>  Filter: ((id = 5) OR (id = 25))
> >> (7 rows)
> >>
> >> --
> >>
> >> I have the same output using Amit's v1-delta.patch or Hosoya's
> >> v2_default_partition_pruning.patch.
> > I think I've figured what may be wrong.
> >
> > Partition pruning step generation code should ignore any arguments of
> > an OR clause that won't be true for a sub-partitioned partition, given
> > its partition constraint.
> >
> > In this case, id = 25 contradicts test2_0_20's partition constraint
> > (which is, a IS NOT NULL AND a >= 0 AND a < 20), so the OR clause
> > should really be simplified to id = 5, ignoring the id = 25 argument.
> > Note that we remove id = 25 only for the considerations of pruning and
> > not from the actual clause that's passed to the final plan, although
> > it wouldn't be a bad idea to try to do that.
> >
> > Attached revised delta patch, which includes the fix described above.
> >
> > Thanks,
> > Amit
> Amit, I tested many cases with nested range sub-partitions... and I did not 
> find any problem with your
> last patch  :-)
> 
> I tried mixing with hash partitions with no problems.
> 
> From the patch, there seems to be less checks than before. I cannot think of 
> a case that can have
> performance impacts.
> 
> Hosoya-san, if you agree with Amit's proposal, do you think you can send a 
> patch unifying your
> default_partition_pruning.patch and Amit's second v1-delta.patch?
>

I understood Amit's proposal.  But I think the issue Thibaut reported would 
occur regardless of whether clauses have OR clauses or not as follows.
I tested a query which should output "One-Time Filter: false".

# explain select * from test2_0_20 where id = 25;
  QUERY PLAN   
---
 Append  (cost=0.00..25.91 rows=6 width=36)
   ->  Seq Scan on test2_10_20_def  (cost=0.00..25.88 rows=6 width=36)
 Filter: (id = 25)


As Amit described in the previous email, id = 25 contradicts test2_0_20's
partition constraint, so I think this clause should be ignored and we can
also handle this case in the similar way as Amit proposal.

I attached v1-delta-2.patch which fix the above issue.  

What do you think about it?


Best regards,
Yuzuko Hosoya


v1-delta-2.patch
Description: Binary data


RE: Problem with default partition pruning

2019-03-24 Thread Yuzuko Hosoya
Hi,

> 
> Hi,
> 
> On 2019/03/23 2:36, Thibaut Madelaine wrote:
> > I tested your last patch and if I didn't mix up patches on the end of
> > a too long week, I get a problem when querying the sub-sub partition:
> >
> > test=# explain select * from test2_0_10 where id = 25;
> >  QUERY PLAN
> > 
> >  Seq Scan on test2_0_10  (cost=0.00..25.88 rows=6 width=36)
> >Filter: (id = 25)
> > (2 rows)
> 
> The problem here is not really related to partition pruning, but another 
> problem I recently sent an
> email about:
> 
> https://www.postgresql.org/message-id/9813f079-f16b-61c8-9ab7-4363cab28d80%40lab.ntt.co.jp
> 
> The problem in this case is that *constraint exclusion* is not working, 
> because partition constraint
> is not loaded by the planner.  Note that pruning is only used if a query 
> specifies the parent table,
> not a partition.

Thanks for the comments.

I saw that email.  Also, I checked that query Thibaut mentioned worked
correctly with Amit's patch discussed in that thread.


Best regards,
Yuzuko Hosoya





RE: Problem with default partition pruning

2019-04-01 Thread Yuzuko Hosoya
Hi,

> Maybe we should have two patches as we seem to be improving two things:
> 
> 1. Patch to fix problems with default partition pruning originally reported 
> by Hosoya-san
> 
> 2. Patch to determine if a given clause contradicts a sub-partitioned table's 
> partition constraint,
> fixing problems unearthed by Thibaut's tests

I attached the latest patches according to Amit comment.
v3_default_partition_pruning.patch fixes default partition pruning problems
and ignore_contradictory_where_clauses_at_partprune_step.patch fixes
sub-partition problems Thibaut tested.

Best regards,
Yuzuko Hosoya


ignore_contradictory_where_clauses_at_partprune_step.patch
Description: Binary data


v3_default_partition_pruning.patch
Description: Binary data


RE: Problem with default partition pruning

2019-04-03 Thread Yuzuko Hosoya
Amit-san,

Thanks for the comments.

> 
> Thanks for dividing patches that way.
> 
> Would it be a good idea to add some new test cases to these patches, just so 
> it's easily apparent what
> we're changing?
Yes, I agree with you.

> 
> So, we could add the test case presented by Thibaut at the following link to 
> the
> default_partition_pruning.patch:
> 
> https://www.postgresql.org/message-id/a4968068-6401-7a9c-8bd4-6a3bc9164a86%40dalibo.com
>
> And, another reported at the following link to
> ignore_contradictory_where_clauses_at_partprune_step.patch:
> 
> https://www.postgresql.org/message-id/bd03f475-30d4-c4d0-3d7f-d2fbde755971%40dalibo.com
> 
> Actually, it might be possible/better to construct the test queries in 
> partition_prune.sql using the
> existing tables in that script, that is, without defining new tables just for 
> adding the new test cases.
> If not, maybe it's OK to create the new tables too.
>
I see.  I added some test cases to each patch according to tests 
discussed in this thread.

However, I found another problem as follows. This query should 
output "One-Time Filter: false" because rlp3's constraints 
contradict WHERE clause.

-
postgres=# \d+ rlp3
   Partitioned table "public.rlp3"
 Column |   Type| Collation | Nullable | Default | Storage  | Stats 
target | Description 
+---+---+--+-+--+--+-
 b  | character varying |   |  | | extended |   
   | 
 a  | integer   |   |  | | plain|   
   | 
Partition of: rlp FOR VALUES FROM (15) TO (20)
Partition constraint: ((a IS NOT NULL) AND (a >= 15) AND (a < 20))
Partition key: LIST (b varchar_ops)
Partitions: rlp3abcd FOR VALUES IN ('ab', 'cd'),
rlp3efgh FOR VALUES IN ('ef', 'gh'),
rlp3nullxy FOR VALUES IN (NULL, 'xy'),
rlp3_default DEFAULT

postgres=# explain select * from rlp3 where a = 2;
 QUERY PLAN 

 Append  (cost=0.00..103.62 rows=24 width=36)
   ->  Seq Scan on rlp3abcd  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
   ->  Seq Scan on rlp3efgh  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
   ->  Seq Scan on rlp3nullxy  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
   ->  Seq Scan on rlp3_default  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
(9 rows)
-

I think that the place of check contradiction process was wrong 
At ignore_contradictory_where_clauses_at_partprune_step.patch.
So I fixed it.

Attached the latest patches. Please check it again.

Best regards,
Yuzuko Hosoya


v2_ignore_contradictory_where_clauses_at_partprune_step.patch
Description: Binary data


v4_default_partition_pruning.patch
Description: Binary data


RE: Problem with default partition pruning

2019-04-08 Thread Yuzuko Hosoya
Amit-san,

> -Original Message-
> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Sent: Friday, April 05, 2019 6:47 PM
> To: Yuzuko Hosoya ; 'Thibaut' 
> ; 'Imai,
> Yoshikazu' 
> Cc: 'PostgreSQL Hackers' 
> Subject: Re: Problem with default partition pruning
> 
> Hosoya-san,
> 
> 
> On 2019/04/04 13:00, Yuzuko Hosoya wrote:
> > I added some test cases to each patch according to tests discussed in
> > this thread.
> 
> Thanks a lot.
> 
> > However, I found another problem as follows. This query should output
> > "One-Time Filter: false" because rlp3's constraints contradict WHERE
> > clause.
> >
> > -
> > postgres=# \d+ rlp3
> >Partitioned table "public.rlp3"
> >  Column |   Type| Collation | Nullable | Default | Storage  | 
> > Stats target | Description
> >
> +---+---+--+-+--+--+-
> 
> >  b  | character varying |   |  | | extended |   
> >|
> >  a  | integer   |   |  | | plain|   
> >|
> > Partition of: rlp FOR VALUES FROM (15) TO (20) Partition constraint:
> > ((a IS NOT NULL) AND (a >= 15) AND (a < 20)) Partition key: LIST (b
> > varchar_ops)
> > Partitions: rlp3abcd FOR VALUES IN ('ab', 'cd'),
> > rlp3efgh FOR VALUES IN ('ef', 'gh'),
> > rlp3nullxy FOR VALUES IN (NULL, 'xy'),
> > rlp3_default DEFAULT
> >
> > postgres=# explain select * from rlp3 where a = 2;
> >  QUERY PLAN
> > 
> >  Append  (cost=0.00..103.62 rows=24 width=36)
> >->  Seq Scan on rlp3abcd  (cost=0.00..25.88 rows=6 width=36)
> >  Filter: (a = 2)
> >->  Seq Scan on rlp3efgh  (cost=0.00..25.88 rows=6 width=36)
> >  Filter: (a = 2)
> >->  Seq Scan on rlp3nullxy  (cost=0.00..25.88 rows=6 width=36)
> >  Filter: (a = 2)
> >->  Seq Scan on rlp3_default  (cost=0.00..25.88 rows=6 width=36)
> >  Filter: (a = 2)
> > (9 rows)
> > -
> 
> This one too would be solved with the other patch I mentioned to fix
> get_relation_info() to load the partition constraint so that constraint 
> exclusion can use it.
> Partition in the earlier example given by Thibaut is a leaf partition, 
> whereas rlp3 above is a
> sub-partitioned partition, but both are partitions nonetheless.
> 
> Fixing partprune.c like we're doing with the
> v2_ignore_contradictory_where_clauses_at_partprune_step.patch only works for 
> the latter, because only
> partitioned tables visit partprune.c.
> 
> OTOH, the other patch only applies to situations where constraint_exclusion = 
> on.
> 
I see.  I think that following example discussed in this thread before would
also be solved with your patch, not 
v2_ignore_contradictory_where_clauses_at_partprune_step.patch.

postgres=# set constraint_exclusion to on;

postgres=# explain select * from test2_0_20 where id = 25;
QUERY PLAN
--
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)


> > I think that the place of check contradiction process was wrong At
> > ignore_contradictory_where_clauses_at_partprune_step.patch.
> > So I fixed it.
> 
> Thanks.  Patch contains some whitespace noise:
> 
> $ git diff --check
> src/backend/partitioning/partprune.c:790: trailing whitespace.
> + * given its partition constraint, we can ignore it,
> src/backend/partitioning/partprune.c:791: trailing whitespace.
> + * that is not try to pass it to the pruning code.
> src/backend/partitioning/partprune.c:792: trailing whitespace.
> + * We should do that especially to avoid pruning code
> src/backend/partitioning/partprune.c:810: trailing whitespace.
> +
> src/test/regress/sql/partition_prune.sql:87: trailing whitespace.
> +-- where clause contradicts sub-partition's constraint
> 
> Can you please fix it?
> 
Thanks for checking.
I'm attaching the latest patch.

> 
> BTW, now I'm a bit puzzled between whether this case should be fixed by 
> hacking on partprune.c like
> this patch does or whether to work on getting the other patch committed and 
> expect users to set
> constraint_exclusion = on for this to behave as expected.  The original 
> intention of setting
> parti

RE: Problem with default partition pruning

2019-04-09 Thread Yuzuko Hosoya
Horiguchi-san,

Thanks for your comments.

> -Original Message-
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> Sent: Tuesday, April 09, 2019 10:33 AM
> To: hosoya.yuz...@lab.ntt.co.jp
> Cc: langote_amit...@lab.ntt.co.jp; thibaut.madela...@dalibo.com; 
> imai.yoshik...@jp.fujitsu.com;
> pgsql-hackers@lists.postgresql.org
> Subject: Re: Problem with default partition pruning
> 
> Sigh..
> 
> At Tue, 09 Apr 2019 10:28:48 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI
>  wrote in
> <20190409.102848.252476604.horiguchi.kyot...@lab.ntt.co.jp>
> > As the second thought. Partition constraint is not constraint
> > expression so that's fair to apply partqual ignoring
> > constraint_exclusion. The variable is set false to skip useless
> > expression evaluation on all partitions, but partqual should be
> > evaluated just once.  Sorry for my confusion.
> >
> > So still it is wrong that the new code is added in
> > gen_partprune_steps_internal.
> 
> So still it is wrong that the new code is added at the beginning of the loop 
> on clauses in
> gen_partprune_steps_internal.
> 
> >   If partqual results true and the clause
> > is long, the partqual is evaluated uselessly at every recursion.
> >
> > Maybe we should do that when we find that the current clause doesn't
> > match part attributes. Specifically just after the for loop "for (i =
> > 0 ; i < part_scheme->partnattrs; i++)".
>
I think we should check whether WHERE clause contradicts partition
constraint even when the clause matches part attributes.  So I moved
"if (partqual)" block to the beginning of the loop you mentioned. 

I'm attaching the latest version.  Could you please check it again?

Best regards,
Yuzuko Hosoya


v4_ignore_contradictory_where_clauses_at_partprune_step.patch
Description: Binary data


Re: Problem with default partition pruning

2019-04-09 Thread Yuzuko Hosoya
Horiguchi-san,

> -Original Message-
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> Sent: Tuesday, April 09, 2019 5:37 PM
> To: hosoya.yuz...@lab.ntt.co.jp
> Cc: langote_amit...@lab.ntt.co.jp; thibaut.madela...@dalibo.com; 
> imai.yoshik...@jp.fujitsu.com; pgsql-hackers@lists.postgresql.org
> Subject: Re: Problem with default partition pruning
> 
> Hi.
> 
> At Tue, 9 Apr 2019 16:41:47 +0900, "Yuzuko Hosoya" 
>  wrote in 
> <00cf01d4eea7$afa43370$0eec9a50$@lab.ntt.co.jp>
> > > So still it is wrong that the new code is added at the beginning 
> > > of the loop on clauses in gen_partprune_steps_internal.
> > >
> > > >   If partqual results true and the 
> > > > clause is long, the partqual is evaluated uselessly at every recursion.
> > > >
> > > > Maybe we should do that when we find that the current clause 
> > > > doesn't match part attributes. Specifically just after the for 
> > > > loop "for (i =
> > > > 0 ; i < part_scheme->partnattrs; i++)".
> > >
> > I think we should check whether WHERE clause contradicts partition 
> > constraint even when the clause matches part attributes.  So I moved
> 
> Why?  If clauses contains a clause on a partition key, the clause is 
> involved in determination of whether a partition survives or not in 
> ordinary way. Could you show how or on what configuration (tables and
> query) it happens that such a matching clause needs to be checked against 
> partqual?
> 
We found that partition pruning didn't work as expect when we scanned a 
sub-partition using WHERE
clause which contradicts the sub-partition's constraint by Thibaut tests.
The example discussed in this thread as follows.

postgres=# \d+ test2
 Partitioned table "public.test2"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | 
Description 
+-+---+--+-+--+--+-
 id | integer |   |  | | plain|  | 
 val| text|   |  | | extended |  | 
Partition key: RANGE (id)
Partitions: test2_0_20 FOR VALUES FROM (0) TO (20), PARTITIONED,
test2_20_plus_def DEFAULT

postgres=# \d+ test2_0_20
   Partitioned table "public.test2_0_20"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | 
Description 
+-+---+--+-+--+--+-
 id | integer |   |  | | plain|  | 
 val| text|   |  | | extended |  | 
Partition of: test2 FOR VALUES FROM (0) TO (20) Partition constraint: ((id IS 
NOT NULL) AND (id >=
0) AND (id < 20)) Partition key: RANGE (id)
Partitions: test2_0_10 FOR VALUES FROM (0) TO (10),
test2_10_20_def DEFAULT

postgres=# explain (costs off) select * from test2 where id=5 or id=20;
   QUERY PLAN
-
 Append
   ->  Seq Scan on test2_0_10
 Filter: ((id = 5) OR (id = 20))
   ->  Seq Scan on test2_10_20_def
 Filter: ((id = 5) OR (id = 20))
   ->  Seq Scan on test2_20_plus_def
 Filter: ((id = 5) OR (id = 20))
(7 rows)

postgres=# explain (costs off) select * from test2_0_20 where id=25;
 QUERY PLAN  
-
 Seq Scan on test2_10_20_def
   Filter: (id = 25)
(2 rows)

So I think we have to check if WHERE clause contradicts sub-partition's 
constraint regardless of
whether the clause matches part attributes or not.

> The "if (partconstr)" block uselessly runs for every clause in the clause 
> tree other than
BoolExpr.
> If we want do that, isn't just doing predicate_refuted_by(partconstr, 
> clauses, false) sufficient before looping over clauses?
Yes, I tried doing that in the original patch.

> 
> 
> > "if (partqual)" block to the beginning of the loop you mentioned.
> >
> > I'm attaching the latest version.  Could you please check it again?
> 
> regards.
> 
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

Best regards,
Yuzuko Hosoya






Runtime pruning problem

2019-04-16 Thread Yuzuko Hosoya
Hi all,

I found a runtime pruning test case which may be a problem as follows:


create table t1 (id int, dt date) partition by range(dt);
create table t1_1 partition of t1 for values from ('2019-01-01') to 
('2019-04-01');
create table t1_2 partition of t1 for values from ('2019-04-01') to 
('2019-07-01');
create table t1_3 partition of t1 for values from ('2019-07-01') to 
('2019-10-01');
create table t1_4 partition of t1 for values from ('2019-10-01') to 
('2020-01-01');

In this example, current_date is 2019-04-16.

postgres=# explain select * from t1 where dt = current_date + 400;
 QUERY PLAN 

 Append  (cost=0.00..198.42 rows=44 width=8)
   Subplans Removed: 3
   ->  Seq Scan on t1_1  (cost=0.00..49.55 rows=11 width=8)
 Filter: (dt = (CURRENT_DATE + 400))
(4 rows)

postgres=# explain analyze select * from t1 where dt = current_date + 400;
  QUERY PLAN
   
---
 Append  (cost=0.00..198.42 rows=44 width=8) (actual time=0.000..0.001 rows=0 
loops=1)
   Subplans Removed: 3
   ->  Seq Scan on t1_1  (cost=0.00..49.55 rows=11 width=8) (never executed)
 Filter: (dt = (CURRENT_DATE + 400))
 Planning Time: 0.400 ms
 Execution Time: 0.070 ms
(6 rows)


I realized t1_1 was not scanned actually since "never executed" 
was displayed in the plan using EXPLAIN ANALYZE.  But I think 
"One-Time Filter: false" and "Subplans Removed: ALL" or something
like that should be displayed instead.

What do you think?


Best regards,
Yuzuko Hosoya
NTT Open Source Software Center






A typo in partprune.c

2018-11-08 Thread Yuzuko Hosoya
Hi,

I found a typo in the comment of get_matching_range_bounds.
/*
- * get_matching_range_datums
+ * get_matching_range_bounds
  * Determine the offsets of range bounds matching the specified 
values,
  * according to the semantics of the given operator strategy

Here is the patch to fix it.

Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



fix_partprune_typo.patch
Description: Binary data


RE: A typo in partprune.c

2018-11-08 Thread Yuzuko Hosoya
Hi Michael,

> From: Michael Paquier [mailto:mich...@paquier.xyz]
> Sent: Thursday, November 08, 2018 8:17 PM
> 
> On Thu, Nov 08, 2018 at 07:19:14PM +0900, Yuzuko Hosoya wrote:
> > Here is the patch to fix it.
> 
> Thanks, committed.

Thank you.

Yuzuko Hosoya
NTT Open Source Software Center






Problem with default partition pruning

2019-02-22 Thread Yuzuko Hosoya
Hi,

I found the bug of default partition pruning when executing a range query.

-
postgres=# create table test1(id int, val text) partition by range (id); 
postgres=# create table test1_1 partition of test1 for values from (0) to 
(100); 
postgres=# create table test1_2 partition of test1 for values from (150) to 
(200);
postgres=# create table test1_def partition of test1 default; 

postgres=# explain select * from test1 where id > 0 and id < 30;
   QUERY PLAN   

 Append  (cost=0.00..11.83 rows=59 width=11)
   ->  Seq Scan on test1_1  (cost=0.00..5.00 rows=58 width=11)
 Filter: ((id > 0) AND (id < 30))
   ->  Seq Scan on test1_def  (cost=0.00..6.53 rows=1 width=12)
 Filter: ((id > 0) AND (id < 30))
(5 rows)

There is no need to scan the default partition, but it's scanned.
-

In the current implement, whether the default partition is scanned
or not is determined according to each condition of given WHERE
clause at get_matching_range_bounds().  In this example, scan_default
is set true according to id > 0 because id >= 200 matches the default
partition.  Similarly, according to id < 30, scan_default is set true.
Then, these results are combined according to AND/OR at 
perform_pruning_combine_step().
In this case, final result's scan_default is set true.

The modifications I made are as follows:
- get_matching_range_bounds() determines only offsets of range bounds
  according to each condition 
- These results are combined at perform_pruning_combine_step()
- Whether the default partition is scanned or not is determined at 
  get_matching_partitions()

Attached the patch.  Any feedback is greatly appreciated.

Best regards,
---
Yuzuko Hosoya
NTT Open Source Software Center


default_partition_pruning.patch
Description: Binary data


RE: Problem with default partition pruning

2019-02-26 Thread Yuzuko Hosoya
Amit-san,

> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Sent: Wednesday, February 27, 2019 11:22 AM
> 
> Hosoya-san,
> 
> On 2019/02/22 17:14, Yuzuko Hosoya wrote:
> > Hi,
> >
> > I found the bug of default partition pruning when executing a range query.
> >
> > -
> > postgres=# create table test1(id int, val text) partition by range
> > (id); postgres=# create table test1_1 partition of test1 for values
> > from (0) to (100); postgres=# create table test1_2 partition of test1
> > for values from (150) to (200); postgres=# create table test1_def
> > partition of test1 default;
> >
> > postgres=# explain select * from test1 where id > 0 and id < 30;
> >QUERY PLAN
> > 
> >  Append  (cost=0.00..11.83 rows=59 width=11)
> >->  Seq Scan on test1_1  (cost=0.00..5.00 rows=58 width=11)
> >  Filter: ((id > 0) AND (id < 30))
> >->  Seq Scan on test1_def  (cost=0.00..6.53 rows=1 width=12)
> >  Filter: ((id > 0) AND (id < 30))
> > (5 rows)
> >
> > There is no need to scan the default partition, but it's scanned.
> > -
> >
> > In the current implement, whether the default partition is scanned or
> > not is determined according to each condition of given WHERE clause at
> > get_matching_range_bounds().  In this example, scan_default is set
> > true according to id > 0 because id >= 200 matches the default
> > partition.  Similarly, according to id < 30, scan_default is set true.
> > Then, these results are combined according to AND/OR at 
> > perform_pruning_combine_step().
> > In this case, final result's scan_default is set true.
> >
> > The modifications I made are as follows:
> > - get_matching_range_bounds() determines only offsets of range bounds
> >   according to each condition
> > - These results are combined at perform_pruning_combine_step()
> > - Whether the default partition is scanned or not is determined at
> >   get_matching_partitions()
> >
> > Attached the patch.  Any feedback is greatly appreciated.
> 
> Thank you for reporting.  Can you please add this to March CF in Bugs 
> category so as not to lose
track
> of this?
> 
> I will try to send review comments soon.
> 
Thank you for your reply.  I added this to March CF.

Regards,
Yuzuko Hosoya
NTT Open Source Software Center





Improve selectivity estimate for range queries

2018-12-20 Thread Yuzuko Hosoya
Hi,

I found the problem about selectivity estimate for range queries
on master as follows.  This is my test case:

postgres=# CREATE TABLE test(id int, val text);
CREATE TABLE
postgres=# INSERT INTO test SELECT i, 'testval' FROM generate_series(0,2)i;
INSERT 0 3
postgres=# VACUUM analyze test;
VACUUM
postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE id > 0 and id < 1;
  QUERY PLAN

---
 Seq Scan on test  (cost=0.00..613.00 rows=150 width=12) (actual 
time=0.044..21.371 rows=
loops=1)
   Filter: ((id > 0) AND (id < 1))
   Rows Removed by Filter: 20001
 Planning Time: 0.099 ms
 Execution Time: 37.142 ms
(5 rows)

Although the actual number of rows was , the estimated number
of rows was 150.

So I dug to see what happened and thought that the following part
in clauselist_selectivity caused this problem.


/*
  * Now scan the rangequery pair list.
  */
 while (rqlist != NULL)
 {
 RangeQueryClause *rqnext;

 if (rqlist->have_lobound && rqlist->have_hibound)
 {
 /* Successfully matched a pair of range clauses */
 Selectivity s2;

 /*
  * Exact equality to the default value probably means the
  * selectivity function punted.  This is not airtight but should
  * be good enough.
  */
 if (rqlist->hibound == DEFAULT_INEQ_SEL ||
 rqlist->lobound == DEFAULT_INEQ_SEL)
 {
 s2 = DEFAULT_RANGE_INEQ_SEL;
 }
 else
 {
 s2 = rqlist->hibound + rqlist->lobound - 1.0;


In my environment, the selectivity for id > 0 was 0.1,
and the selectivity for id < 1 was 0.1. Then, the
value of rqlist->hibound and rqlist->lobound are set respectively.
On the other hand, DEFAULT_INEQ_SEL is 0..  As a result,
the final selectivity is computed according to DEFAULT_RANGE_INEQ_SEL,
since the condition of the second if statement was satisfied. However,
these selectivities were computed according to the statistics, so the
final selectivity had to be calculated from rqlist->hibound + rqlist->lobound - 
1.0.

My test case might be uncommon, but I think it would cause some problems.
To handle such cases I've thought up of an idea based on a previous commit[1]
which solved a similar problem related to DEFAULT_NUM_DISTINCT.  Just like
this modification, I added a flag which shows whether the selectivity
was calculated according to the statistics or not to clauselist_selectivity,
and used it as a condition of the if statement instead of 
rqlist->hibound == DEFAULT_INEQ_SEL and rqlist->lobound == DEFAULT_INEQ_SEL.
I am afraid that changes were more than I had expected, but we can estimate
selectivity correctly.

Patch attached.  Do you have any thoughts?

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=4c2777d0b733220d9029f78817af8ce

Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


improve_selectivity_estimate_for_range_queries.patch
Description: Binary data


RE: Improve selectivity estimate for range queries

2018-12-21 Thread Yuzuko Hosoya
Hi,

Thanks for the comments.
I attach the v2 patch.

> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> Sent: Friday, December 21, 2018 12:25 PM
> 
> Hello.
> 
> At Thu, 20 Dec 2018 17:21:29 +0900, "Yuzuko Hosoya" 
>  wrote in
> <008701d4983d$02e731c0$08b59540$@lab.ntt.co.jp>
> > In my environment, the selectivity for id > 0 was 0.1,
> > and the selectivity for id < 1 was 0.1. Then, the
> > value of rqlist->hibound and rqlist->lobound are set respectively.
> > On the other hand, DEFAULT_INEQ_SEL is 0..  As a
> > result, the final selectivity is computed according to
> > DEFAULT_RANGE_INEQ_SEL, since the condition of the second if statement
> > was satisfied. However, these selectivities were computed according to
> > the statistics, so the final selectivity had to be calculated from 
> > rqlist->hibound +
rqlist->lobound
> - 1.0.
> > My test case might be uncommon, but I think it would cause some problems.
> 
> Yeah, its known behavior as the comment just above. If in your example query, 
> the table weer very
large
> and had an index it could run faultly an index scan over a fair amount of 
> tuples. But I'm not sure
> how much it matters and your example doesn't show that.
> 
> The behvavior discussed here is introduced by this commit.
> 
> | commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a
> | Author: Tom Lane 
> | Date:   Tue Nov 9 00:34:46 2004 +
> |
> | Use a hopefully-more-reliable method of detecting default selectivity
> | estimates when combining the estimates for a range query.  As pointed 
> out
> | by Miquel van Smoorenburg, the existing check for an impossible combined
> | result would quite possibly fail to detect one default and one 
> non-default
> | input.  It seems better to use the default range query estimate in such
> | cases.  To do so, add a check for an estimate of exactly 
> DEFAULT_INEQ_SEL.
> | This is a bit ugly because it introduces additional coupling between
> | clauselist_selectivity and scalarltsel/scalargtsel, but it's not like
> | there wasn't plenty already...
> 
> > To handle such cases I've thought up of an idea based on a previous
> > commit[1] which solved a similar problem related to
> > DEFAULT_NUM_DISTINCT.  Just like this modification, I added a flag
> > which shows whether the selectivity
> 
> The commit [1] added almost no complexity, but this does. Affects many 
> functions to introduce
tighter
> coupling between operator-selectivity functions and clause selectivity 
> functions.
> isdefault travells alone too long just to bind remote functions. We might 
> need more pricipled way
to
> handle that.
>
Yes, indeed.  But I have not found better way than this approach yet.
 
> > was calculated according to the statistics or not to
> > clauselist_selectivity, and used it as a condition of the if statement
> > instead of
> > rqlist->hibound == DEFAULT_INEQ_SEL and rqlist->lobound == DEFAULT_INEQ_SEL.
> > I am afraid that changes were more than I had expected, but we can
> > estimate selectivity correctly.
> >
> > Patch attached.  Do you have any thoughts?
> 
> I've just run over the patch, but some have comments.
> 
> + if (isdefault)
> + rqelem->lobound_isdefault = true;
> 
> This is taking the disjunction of lobounds ever added, I suppose it should be 
> the last lobounds'
> isdefault.
> 
Indeed.  I fixed it.

> As you see, four other instances of "selec ==/!= DEFAULT_*" exist in 
> mergejoinsel. Don't they lead
> to similar kind of problem?
>
I didn't encounter similar problems, but as you said, such problems would be 
occurred due to the
condition, selec != DEFAULT_INEQ_SEL.  I changed these conditions into 
"!isdefault".

> 
> 
>  Selectivity lobound;/* Selectivity of a var > something clause */
>  Selectivity hibound;/* Selectivity of a var < something clause */
> +boollobound_isdefault;/* lobound is a default selectivity? */
> +boolhibound_isdefault;/* hibound is a default selectivity? */
>  } RangeQueryClause;
> 
> Around the [1], there was a discussion about introducing the notion of 
> uncertainty to selectivity.
> The isdefualt is a kind of uncertainty value indicating '0/100% uncertain'. 
> So my feeling is
saying
> that it's better that Selectivity has certinty component then building a 
> selectivity arithmetics
> involving uncertainty, though I don't have a cle

RE: Improve selectivity estimate for range queries

2019-01-10 Thread Yuzuko Hosoya
Hi,

Thanks for the comments, and I'm sorry for the late reply.

> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Friday, January 11, 2019 7:04 AM
> > Robert Haas  writes:
> > On Fri, Dec 21, 2018 at 11:50 AM Tom Lane  wrote:
> >> A smaller-footprint way to fix the immediate problem might be to
> >> change the values of DEFAULT_INEQ_SEL and friends so that they're
> >> even less likely to be matched by accident.  That is, instead of
> >> 0., use 0.342 or some such.
> 
> > That's not a dumb idea, but it seems pretty unprincipled to me, and to
> > be honest I'm kind of surprised that you're not proposing something
> > cleaner.
> 
> The problem is the invasiveness of such a change (large) vs the benefit (not 
> so large).  The
upthread
> patch attempted to add a separate signaling path, but it was very incomplete 
> --- and yet both I
and
> Horiguchi-san thought it was already too messy.
> 
> Maybe at some point we'll go over to something reasonably principled, like 
> adding confidence
intervals
> to all selectivity estimates.  That would be *really* invasive but perhaps 
> would bring enough
benefit
> to justify itself.  But the current patch is just attempting to fix one 
> extremely narrow pain
point,
> and if that is all it's doing it should have a commensurately small 
> footprint.  So I don't think
the
> submitted patch looks good from a cost/benefit standpoint.
> 
Yes, I agree with you.  Indeed the patch I attached is insufficient in 
cost-effectiveness.
However, I want to solve problems of that real estimates happened to equal to 
the default 
values such as this case, even though it's a narrow pain point.  So I tried 
distinguishing
explicitly between real estimates and otherwise as Robert said.

The idea Tom proposed and Horiguchi-san tried seems reasonable, but I'm 
concerned whether
any range queries really cannot match 0.342 (or some such) by 
accident in any 
environments.  Is the way which Horiguchi-san did enough to prove that?


Best regards,
Yuzuko Hosoya
NTT Open Source Software Center





Proposal: Partitioning Advisor for PostgreSQL

2018-05-24 Thread Yuzuko Hosoya
Hello,

I'm Yuzuko Hosoya. This is my first PostgreSQL project participation.

I have been developing partitioning advisor prototype with Julien Rouhaud.
It will be a new feature of HypoPG[1], which is a PostgreSQL extension, and
will help partitioning design tuning.  Currently, HypoPG only supports index
design tuning; it allows users to define hypothetical indexes for real tables 
and
shows resulting queries' plan/cost with EXPLAIN as if they were actually 
constructed.
Since declarative partitioning will be greatly improved in PostgreSQL 11 and 
further
versions, there are emerging needs to support partitioning design tuning. This 
is
why we are working on partitioning advisor.  We plan to release the first 
version 
of partitioning advisor for PostgreSQL 11, and then, improve it for PostgreSQL 
12.


Overview of partitioning advisor
---
- Partitioning advisor allows users to define multiple hypothetical partitioning
  schemes on real tables and real data
- PostgreSQL can show resulting queries' plan/cost with EXPLAIN using 
hypothetical
  partitioning schemes
Users can quickly check how their queries would behave if some tables were 
partitioned, and try different partitioning schemes (for instance, to optimize 
some
queries efficiency Vs. maintenance efficiency).


Partitioning advisor works as follows:

Usage
-
0. Consider this target table, t1
#= CREATE TABLE t1 (a int, b text);
#= INSERT INTO t1 SELECT i, 'test' FROM generate_series(1,299) i ;
#= EXPLAIN SELECT * FROM t1;
  QUERY PLAN
   -
   Seq Scan on t1  (cost=0.00..4.99 rows=299 width=9)
   (1 row)

1. Partition the target table hypothetically
#= SELECT * FROM hypopg_partition_table('t1','partition by range(a)');
   The hypopg_partition_table() defines hypothetical range partitioned table 
't1' 
   by the partition key 'a' and stores these information into backend local 
memory.

2. Create hypothetical partitions
#= SELECT * FROM hypopg_add_partition('t1_1','partition of t1 for values 
from (1) to (100)');
#= SELECT * FROM hypopg_add_partition('t1_2','partition of t1 for values 
from (100) to (300)');
   The hypopg_add_partition() defines hypothetical partitions t1_1 and t1_2 
according 
   to their bounds 'from (1) to (100)' and 'from (100) to (300)' respectively, 
and stores 
   these information into backend local memory.

3. PostgreSQL can show resulting queries' plan/cost with EXPLAIN
#= EXPLAIN SELECT * FROM t1;
QUERY PLAN
   ---
   Append  (cost=0.00..7.49 rows=299 width=9)
 ->  Seq Scan on t1 t1_1  (cost=0.00..1.99 rows=99 width=9)
 ->  Seq Scan on t1 t1_2  (cost=0.00..4.00 rows=200 width=9)
   (3 rows)
  PostgreSQL retrieves hypothetical partitioning schemes from HypoPG.
  And then if the referred table is defined as hypothetical partitioned table, 
  PostgreSQL creates plans using them.

This is a simple example. In addition, it enables us to simulate range/list/hash
partitioning, partition pruning, N-way join and partition-wise 
join/aggregation.  
It is already helpful for users to design partitioning schemes.


Current implementation

We mainly use get_relation_info_hook().  What we do in this hook is to inject
hypothetical partitioning schemes according to user definition.  At first, we do
all processes that are done at expand_inherited_tables().  Specifically, we 
expand root->simple_rte_array and root->parse->rtable, rewrite target 
table's RangeTblEntry as a partitioned table, and create RangeTblEntries and 
AppendRelInfos for all hypothetical partitions.  Besides that, we set 
hypothetical
partition's name into rte->alias->aliasname at this time to display hypothetical
partition's name with EXPLAIN.  And then, we rewrite RelOptInfo as needed.  
Specifically, we add partition information, which is set at 
set_relation_partition_info(),
to hypothetical partitioned tables, and set rel->tuples and rel->pages for 
hypothetical partitions.


However, PostgreSQL isn't designed to have hypothetical tables, so we have 
some problematic blockers for an implementation as follows.  We'd like to 
discuss these topics.

Topics of discussion
-
- Expanding partition's RTE
We have to do all processes which are done at expand_inherited_tables() for 
hypothetical partitions. But, since there are no hooks around here, we use 
get_relation_info_hook() as I mentioned above. In this case, we cannot simulate
update queries correctly, because inheritance_planner() which handles update 
queries is called before get_relation_info_hook().  Therefore, we'd like to see 
if 
we could add a hook at ex