date:20201203

Re: SELECT INTO deprecation

2020-12-03 Thread Thomas Kellerer

Stephen Frost schrieb am 02.12.2020 um 18:58:
> We should either remove it, or remove the comments that it's deprecated,
> not try to make it more deprecated or try to somehow increase the
> recommendation to not use it.

(I am writing from a "user only" perspective, not a developer)

I don't see any warning about the syntax being "deprecated" in the current 
manual.

There is only a note that says that CTAS is "recommended" instead of SELET INTO,
but for me that's something entirely different than "deprecating" it.

I personally have nothing against removing it, but I still see it used
a lot in questions on various online forums, and I would think that
a lot of people would be very unpleasantly surprised if a feature
gets removed without any warning (the current "recommendation" does not
constitute a deprecation or even removal warning for most people I guess)

I would vote for a clear deprecation message as suggested by Peter, but I would
add "and will be removed in a future version" to it.

Not sure if maybe even back-patching that warning would make sense as well, so
that also users of older versions get to see that warning.

Then target 15 or 16 as the release for removal, but not 14

Thomas

Re: Multi Inserts in CREATE TABLE AS - revived patch

2020-12-03 Thread Dilip Kumar

On Mon, Nov 30, 2020 at 10:49 AM Bharath Rupireddy
 wrote:
>
> Hi,
>
> Currently, required logic for multi inserts (such as buffer slots allocation, 
> flushing, tuple size calculation to decide when to flush, cleanup and so on) 
> is being handled outside of the existing tableam APIs. And there are a good 
> number of cases where multi inserts can be used, such as for existing COPY or 
> for CTAS, CREATE/REFRESH MATERIALIZED VIEW [proposed in this thread], and 
> INSERT INTO SELECTs [here] which are currently under discussion. Handling the 
> same multi inserts logic in many places is error prone and duplicates most of 
> the code. To avoid this, proposing here are generic tableam APIs, that can be 
> used in all the cases and which also gives the flexibility to tableam 
> developers in implementing multi inserts logic dependent on the underlying 
> storage engine[1].
>
> I would like to seek thoughts/opinions on the proposed new APIs. Once 
> reviewed, I will start implementing them.

IMHO, if we think that something really specific to the tableam then
it makes sense to move it there.  But just to avoid duplicating the
code it might not be the best idea.  Instead, you can write some
common functions and we can call them from different places.  So if
something is very much common and will not vary based on the storage
type we can keep it outside the tableam interface however we can move
them into some common functions to avoid duplication.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Huge memory consumption on partitioned table with FKs

2020-12-03 Thread Kyotaro Horiguchi

At Thu, 3 Dec 2020 16:41:45 +0900, Amit Langote  wrote 
in 
> On Thu, Dec 3, 2020 at 2:29 PM Kyotaro Horiguchi
>  wrote:
> > At Thu, 3 Dec 2020 12:27:53 +0900, Amit Langote  
> > wrote in
> > > On Thu, Dec 3, 2020 at 10:15 AM Kyotaro Horiguchi
> > >  wrote:
> > > For the queries on the referencing side ("check" side),
> > > type/collation/attribute name determined using the above are going to
> > > be the same for all partitions in a given tree irrespective of the
> > > attribute number, because they're logically the same column.  On the
> >
> > Yes, I know that, which is what I meant by "practically" or
> > "actually", but it is not explicitly defined AFAICS.
> 
> Well, I think it's great that we don't have to worry *in this part of
> the code* about partition's fk_attnums not being congruent with the
> root parent's, because ensuring that is the responsibility of the
> other parts of the system such as DDL.  If we have any problems in
> this area, they should be dealt with by ensuring that there are no
> bugs in those other parts.

Agreed.

> > Thus that would be no longer an issue if we explicitly define that
> > "When conpraentid stores a valid value, each element of fk_attnums
> > points to logically the same attribute with the RI_ConstraintInfo for
> > the parent constraint."  Or I'd be happy if we have such a comment
> > there instead.
> 
> I saw a comment in Kuroda-san's v2 patch that is perhaps meant to
> address this point, but the placement needs to be reconsidered:

Ah, yes, that comes from my proposal.

> @@ -366,6 +368,14 @@ RI_FKey_check(TriggerData *trigdata)
> querysep = "WHERE";
> for (int i = 0; i < riinfo->nkeys; i++)
> {
> +
> +   /*
> +   * We share the same plan among all relations in a partition
> +   * hierarchy.  The plan is guaranteed to be compatible since all of
> +   * the member relations are guaranteed to have the equivalent set
> +   * of foreign keys in fk_attnums[].
> +   */
> +
> Oid pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
> Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
> 
> A more appropriate place for this kind of comment would be where
> fk_attnums is defined or in ri_BuildQueryKey() that is shared by
> different RI query issuing functions.

Yeah, I wanted more appropriate place for the comment.  That place
seems reasonable.

> > > referenced side ("restrict", "cascade", "set" side), as you already
> > > mentioned, fk_attnums refers to the top parent table of the
> > > referencing side, so no possibility of they being different in the
> > > various referenced partitions' RI_ConstraintInfos.
> >
> > Right. (I'm not sure I have mention that here, though:p)A
> 
> Maybe I misread but I think you did in your email dated Dec 1 where you said:
> 
> "After an off-list discussion, we confirmed that even in that case the
> patch works as is because fk_attnum (or contuple.conkey) always stores
> key attnums compatible to the topmost parent when conparent has a
> valid value (assuming the current usage of fk_attnum), but I still
> feel uneasy to rely on that unclear behavior."

fk_attnums *doesn't* refers to to top parent talbe of the referencing
side. it refers to attributes of the partition that is compatible with
the same element of fk_attnums of the topmost parent.  Maybe I'm
misreading.


> > > On the topic of how we'd be able to share even the RI_ConstraintInfos
> > > among partitions, that would indeed look a bit more elaborate than the
> > > patch we have right now.
> >
> > Maybe just letting the hash entry for the child riinfo point to the
> > parent riinfo if all members (other than constraint_id, of course)
> > share the exactly the same values.  No need to count references since
> > we don't going to remove riinfos.
> 
> Ah, something maybe worth trying.  Although the memory we'd save by
> sharing the RI_ConstraintInfos would not add that much to the savings
> we're having by sharing the plan, because it's the plans that are a
> memory hog AFAIK.

I agree that plans are rather large but the sharable part of the
RI_ConstraintInfos is 536 bytes, I'm not sure it is small enough
comparing to the plans.  But that has somewhat large footprint.. (See
the attached)

> > > > About your patch, it calculates the root constrid at the time an
> > > > riinfo is created, but when the root-partition is further attached to
> > > > another partitioned-table after the riinfo creation,
> > > > constraint_root_id gets stale.  Of course that dones't matter
> > > > practically, though.
> > >
> > > Maybe we could also store the hash value of the root constraint OID as
> > > rootHashValue and check for that one too in
> > > InvalidateConstraintCacheCallBack().  That would take care of this
> > > unless I'm missing something.
> >
> > Seems to be sound.
> 
> Okay, thanks.
> 
> I have attached a patch in which I've tried to merge the ideas from
> both my patch and

Re: pg_stat_statements oddity with track = all

2020-12-03 Thread legrand legrand

Hi Julien,

> The extra field I've proposed would increase the number of records, as it
> needs
to be a part of the key. 

To get an increase in the number of records that means that the same
statement 
would appear at top level AND nested level. This seems a corner case with
very low 
(neglectible) occurence rate. Did I miss something ?

Regards
PAscal 



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Huge memory consumption on partitioned table with FKs

2020-12-03 Thread Kyotaro Horiguchi

At Thu, 03 Dec 2020 17:13:16 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> I agree that plans are rather large but the sharable part of the
me> RI_ConstraintInfos is 536 bytes, I'm not sure it is small enough
me> comparing to the plans.  But that has somewhat large footprint.. (See
me> the attached)

0001 contains a bug about query_key and get_ri_constaint_root (from
your patch) is not needed there, but the core part is 0002 so please
ignore them.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_stat_statements oddity with track = all

2020-12-03 Thread Sergei Kornilov

Hello

> To get an increase in the number of records that means that the same
> statement
> would appear at top level AND nested level. This seems a corner case with
> very low
> (neglectible) occurence rate.

+1
I think splitting fields into plans_toplevel / plans_nested will be less 
convenient. And more code with higher chance of copypaste errors

regards, Sergei

Re: pg_stat_statements oddity with track = all

2020-12-03 Thread Julien Rouhaud

On Wed, Dec 02, 2020 at 05:13:56PM +0300, Sergei Kornilov wrote:
> Hello
> 
> > - add a parent_statement_id column that would be NULL for top level queries
> 
> Will generate too much entries... Every FK for each different delete/insert, 
> for example.
> But very useful for databases with a lot of stored procedures to find where 
> this query is called. May be new mode track = tree? Use NULL to indicate a 
> top-level query (same as with track=tree) and some constant for any nested 
> queries when track = all.

Maybe pg_stat_statements isn't the best tool for that use case.  For the record
the profiler in plpgsql_check can now track queryid for each statements inside
a function, so you match pg_stat_statements entries.  That's clearly not
perfect as dynamic queries could generate different queryid, but that's a
start.

> Also, currently a top statement will account buffers usage for underlying 
> statements?

I think so.

Re: pg_stat_statements oddity with track = all

2020-12-03 Thread Julien Rouhaud

On Thu, Dec 03, 2020 at 11:40:22AM +0300, Sergei Kornilov wrote:
> Hello
> 
> > To get an increase in the number of records that means that the same
> > statement
> > would appear at top level AND nested level. This seems a corner case with
> > very low
> > (neglectible) occurence rate.
> 
> +1
> I think splitting fields into plans_toplevel / plans_nested will be less 
> convenient. And more code with higher chance of copypaste errors

As I mentioned in a previous message, I really have no idea if that would be a
corner case or not.  For instance with native partitioning, the odds to have
many different query executed both at top level and as a nested statement may
be quite higher.

Re: Multi Inserts in CREATE TABLE AS - revived patch

2020-12-03 Thread Bharath Rupireddy

On Thu, Dec 3, 2020 at 1:38 PM Dilip Kumar  wrote:
>
> On Mon, Nov 30, 2020 at 10:49 AM Bharath Rupireddy
>  wrote:
> >
> > Currently, required logic for multi inserts (such as buffer slots 
> > allocation, flushing, tuple size calculation to decide when to flush, 
> > cleanup and so on) is being handled outside of the existing tableam APIs. 
> > And there are a good number of cases where multi inserts can be used, such 
> > as for existing COPY or for CTAS, CREATE/REFRESH MATERIALIZED VIEW 
> > [proposed in this thread], and INSERT INTO SELECTs [here] which are 
> > currently under discussion. Handling the same multi inserts logic in many 
> > places is error prone and duplicates most of the code. To avoid this, 
> > proposing here are generic tableam APIs, that can be used in all the cases 
> > and which also gives the flexibility to tableam developers in implementing 
> > multi inserts logic dependent on the underlying storage engine[1].
> >
> > I would like to seek thoughts/opinions on the proposed new APIs. Once 
> > reviewed, I will start implementing them.
>
> IMHO, if we think that something really specific to the tableam then
> it makes sense to move it there.  But just to avoid duplicating the
> code it might not be the best idea.  Instead, you can write some
> common functions and we can call them from different places.  So if
> something is very much common and will not vary based on the storage
> type we can keep it outside the tableam interface however we can move
> them into some common functions to avoid duplication.
>

Thanks for the response. Main design goal of the new APIs is to give
flexibility to tableam developers in implementing multi insert logic
dependent on the underlying storage engine. Currently, for all the
underlying storage engines, we follow the same multi insert logic such
as when and how to flush the buffered tuples, tuple size calculation,
and this logic doesn't take into account the underlying storage engine
capabilities. Please have a look at [1] where this point was brought
up by @Luc Vlaming. The subsequent discussion went on to some level of
agreement on the proposed APIs.

I want to clarify that avoiding duplicate multi insert code (for COPY,
CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs) is a byproduct(not a
main design goal) if we implement the new APIs for heap AM. I feel
sorry for projecting the goal as avoiding duplicate code earlier.

I also want to mention that @Andres Freund visualized similar kinds of
APIs in [2].

I tried to keep the API as generic as possible, please have a look at
the new structure and APIs [3].

Thoughts?

[1] - 
https://www.postgresql.org/message-id/ca3dd08f-4ce0-01df-ba30-e9981bb0d54e%40swarm64.com
[2] - 
https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de
[3] - 
https://www.postgresql.org/message-id/CALj2ACV8_O651C2zUqrVSRFDJkp8%3DTMwSdG9%2BmDGL%2BvF6CD%2BAQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Improper use about DatumGetInt32

2020-12-03 Thread Peter Eisentraut


On 2020-11-30 16:32, Alvaro Herrera wrote:

On 2020-Nov-30, Peter Eisentraut wrote:


Patch updated this way.  I agree it's better that way.


Thanks, LGTM.


For a change like this, do we need to change the C symbol names, so that 
there is no misbehavior if the shared library is not updated at the same 
time as the extension is upgraded in SQL?

Single transaction in the tablesync worker?

2020-12-03 Thread Amit Kapila

The tablesync worker in logical replication performs the table data
sync in a single transaction which means it will copy the initial data
and then catch up with apply worker in the same transaction. There is
a comment in LogicalRepSyncTableStart ("We want to do the table data
sync in a single transaction.") saying so but I can't find the
concrete theory behind the same. Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker? I have tried doing so in the attached (a
quick prototype to test) and didn't find any problems with regression
tests. I have tried a few manual tests as well to see if it works and
didn't find any problem. Now, it is quite possible that it is
mandatory to do the way we are doing currently, or maybe something
else is required to remove this requirement but I think we can do
better with respect to comments in this area.

The reason why I am looking into this area is to support the logical
decoding of prepared transactions. See the problem [1] reported by
Peter Smith. Basically, when we stream prepared transactions in the
tablesync worker, it will simply commit the same due to the
requirement of maintaining a single transaction for the entire
duration of copy and streaming of transactions. Now, we can fix that
problem by disabling the decoding of prepared xacts in tablesync
worker. But that will arise to a different kind of problems like the
prepare will not be sent by the publisher but a later commit might
move lsn to a later step which will allow it to catch up till the
apply worker. So, now the prepared transaction will be skipped by both
tablesync and apply worker.

I think apart from unblocking the development of 'logical decoding of
prepared xacts', it will make the code consistent between apply and
tablesync worker and reduce the chances of future bugs in this area.
Basically, it will reduce the checks related to am_tablesync_worker()
at various places in the code.

95 matches

Mail list logo