anyone know how to describe from this mailing list? On Thu, May 23, 2024 at 5:16 AM sud <suds1...@gmail.com> wrote:
> Hello , > It's RDS postgres version 15.4. We suddenly saw the > "MaximumUsedTransactionIDs" reach to ~1.5billion and got alerted by team > members who mentioned the database is going to be in shutdown/hung if this > value reaches to ~2billion and won't be able to serve any incoming > transactions. It was a panic situation. > > I have heard of it before , because of the way postgres works and the XID > being a datatype of length 32 bit integer can only represent (2^32)/2=~2 > billion transactions. However, as RDS performs the auto vacuum , we thought > that we need not worry about this issue. But it seems we were wrong. And we > found one adhoc "SELECT '' query was running on the reader instance since > the last couple of days and when that was killed, the max xid > (MaximumUsedTransactionIDs) dropped to 50million immediately. > > So I have few questions, > > 1)This system is going to be a 24/7 up and running system which will > process ~500million business transactions/day in future i.e. ~4-5billion > rows/day inserted across multiple tables each day. And as I understand each > row will have XID allocated. So in that case , does it mean that, we will > need (5billion/24)=~200million XID/hour and thus , if any such legitimate > application "SELECT" query keeps running for ~10 hours (and thus keep the > historical XID alive) , then it can saturate the > "MaximumUsedTransactionIDs" and make the database standstill in > 2billion/200million=~10hrs. Is this understanding correct? Seems we are > prone to hit this limit sooner going forward. > > 2)We have some legitimate cases where the reporting queries can run for > 5-6hrs. So in such cases if the start of this SELECT query happen at 100th > XID on table TAB1, then whatever transactions happen after that time, > across all other tables(table2, table3 etc) in the database won't get > vacuum until that SELECT query on table1 get vacuumed(as database will try > to keep that same 100th XID image) and the XID will just keep incrementing > for new transaction, eventually reaching the max limit. Is my understanding > correct here? > > 3)Although RDS does the auto vacuum by default. but should we also > consider doing manual vacuum without impacting ongoing transactions? > Something as below options > vacuum freeze tab1; > vacuum freeze; > vacuum; > vacuum analyze tab1; > vacuum tab1; > > 4)Had worked in past in oracle database where the similar transaction > identifier is called as "system change number" , but never encountered that > being exhausted and also there it used to have UNDO record and if a SELECT > query needs anything beyond certain limit(set undo_retention parameter) the > select query used to fail with snapshot too old error but not impacting any > write transactions. But in postgres it seems nothing like that happens and > every "Select query" will try to run till its completion without any such > failure, until it gets skilled by someone. Is my understanding correct? > > And in that case, It seems we have to mandatorily set "statement_timeout" > to some value e.g. 4hrs(also i am not seeing a way to set it for any > specific user level, so it will be set for all queries including > application level) and also "idle_in_transaction_session_timeout" to > 5minutes, even on all the prod and non prod databases, to restrict the long > running transactions/queries and avoid such issues in future. Correct me if > I'm wrong. > > Regards > Sud >