Re: Purging few months old data and vacuuming in production

2022-12-30 Thread Ron

On 12/30/22 00:39, Ranjith Paliyath wrote:

Hi,

We have a PostgreSQL (slightly old version, something like - PostgreSQL 11.2


You know, of course, that you should update to the latest version. It's 
quick and painless.



on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 
4.8.5-36), 64-bit) production, where one particular table and its related 5 
tables need to be purged of 3 months prior data. Each of these tables' daily 
record increment is on an average 2 to 3 million.


270M rows isn't that much.  Deleting 3M rows should not take 3 hours, even 
when there are six tables.


Are the tables tied together by FK?

How big are the rows?


Approach needed is to do a daily purge of 90days prior data.  Probable purge 
processing window is expected to be 2hrs. Observed test timing for deletion is 
exceeding 2-3hrs and we are trying to do vacuuming after the deletes, which is 
again taking exceeding another 2hrs.


Is there an index on the date field?

Can you drop unneeded indices during the window, and then rebuild them 
afterward?


How beefy is your hardware?


There is a suggestion for re-creating the tables with partitions, and as purge 
approach could then be a deletion/dropping of these partitions, which would not 
really require a vacuuming later on.

When we go for a Daily purge approach it should not put a strain on other 
processes which could be affecting this same set of tables, like these tables 
should not get locked because of the purge.

Questions are -
(a) Should we recommend PostgreSQL upgrade, if possible, to v15.1? Could this 
bring in some benefits related to vacuuming?
(b) Would partitioning be an optimal approach?


We tried this with Postgresql 12.x declarative partitioning on tables with 
synthetic keys.  Query performance suffered, since the date field needs to 
be added to the PK, and Pg tends to scan all the partitions, even when the 
date field is part of the WHERE clause. Thus, we departitioned all but the 
two with large bytea columns.


--
Born in Arizona, moved to Babylonia.




Fwd: Segmentation fault on RelationGetDescr in my first extension

2022-12-30 Thread Дмитрий Цветков
Hi!

I'm trying to write my first extension and open a table in it.
I use check_password_hook and my function executes at the moment of
changing user password.

But if I try to open a table inside this function, I get Segmentation fault
on the line with   "RelationGetDescr".
What am I doing wrong?

void
_PG_init(void)
{
prev_check_password_hook = check_password_hook;
check_password_hook = check_password;
}

...

void check_password(const char *username, const char *shadow_pass,
PasswordType password_type, Datum validuntil_time, bool validuntil_null)
{
Relation rel;
TupleDesc tupdesc;
int Natts;

Oid tbl_oid = DatumGetObjectId(DirectFunctionCall1(to_regclass,
CStringGetTextDatum("my_test_table")));
if (OidIsValid(tbl_oid)) {
rel = table_open(tbl_oid, RowExclusiveLock);
tupdesc = RelationGetDescr(rel); // !!! server process (PID 70525) was
terminated by signal 11: Segmentation fault
Natts = tupdesc->natts;
table_close(rel, RowExclusiveLock);
} else {
ereport(NOTICE, (errmsg("IndexRelationId (%d) is NOT valid !", tbl_oid)));
}
}

Table is fine:

postgres=# \d my_test_table
  Table "public.my_test_table"
 Column | Type | Collation | Nullable | Default
+--+---+--+-
 name   | text |   |  |


Inserts create new records in reporting table

2022-12-30 Thread Bret Stern

Love the forum,

I'm bringing 100k - 200k of AR transactions into a table from several 
separate ERP companies.


As I insert the records, does it make sense to run a trigger event to 
build or insert data into


a reporting table (to save generating a query to build the reporting 
table later.)



For example, I bring in 120k records which contain invoice dates from 
the beginning of time


in the ERP system. (eg; records with dated transactions from 2010...2022)


Conceptual data

[Location][InvoiceDate][Amount]

Dallas        2012-01-09    7500.00
Dallas        2012-11-19    1200.00
Dallas        2012-04-02    500.00
Phoenix       2012-01-03    1200.00
Phoenix        2012-04-12    7500.00

Would it be possible, or would it make sense to create a trigger which 
populated another


table like below after each insert...or is is a resource killer..or just 
unnecessary, and smarter to create a query


that performs calculated on the fly columns for the dates of invoice?


[Location][InvoiceDate][Amount]    [Jan]            [Feb]     [Mar]     
       [Apr]      [May]         [Jun]  [Jul]         [Aug] 
          [Sep]            [Oct]     [Nov]         [Dec]


Dallas 2012-01-09   7500.00

Dallas 2012-11-19 1200.00

Dallas 2012-04-02 500.00

Phoenix    2012-01-03    1200.00

Phoenix 2012-04-12 7500.00


Bret