On Mon, Feb 11, 2019 at 3:55 PM Ashwin Agrawal <aagra...@pivotal.io> wrote:
> > Thanks for the response and inputs. > > On Sat, Feb 9, 2019 at 4:51 AM Andres Freund <and...@anarazel.de> wrote: > >> Hi, >> >> On 2019-02-08 16:36:13 -0800, Alexandra Wang wrote: >> > Current sequence of operations for drop database (dropdb()) >> > 1. Start Transaction >> > 2. Make catalog changes >> > 3. Drop database buffers >> > 4. Forget database fsync requests >> > 5. Checkpoint >> > 6. Delete database directory >> > 7. Commit Transaction >> > >> > Problem >> > This sequence is unsafe from couple of fronts. Like if drop database, >> > aborts (means system can crash/shutdown can happen) right after buffers >> are >> > dropped step 3 or step 4. The database will still exist and fully >> > accessible but will loose the data from the dirty buffers. This seems >> very >> > bad. >> > >> > Operation can abort after step 5 as well in which can the entries >> remain in >> > catalog but the database is not accessible. Which is bad as well but >> not as >> > severe as above case mentioned, where it exists but some stuff goes >> > magically missing. >> > >> > Repo: >> > ``` >> > CREATE DATABASE test; >> > \c test >> > CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a int); >> > \c postgres >> > DROP DATABASE test; <<====== kill the session after >> DropDatabaseBuffers() >> > (make sure to issue checkpoint before killing the session) >> > ``` >> > >> > Proposed ways to fix >> > 1. CommitTransactionCommand() right after step 2. This makes it fully >> safe >> > as the catalog will have the database dropped. Files may still exist on >> > disk in some cases which is okay. This also makes it consistent with the >> > approach used in movedb(). >> >> To me this seems bad. The current failure mode obviously isn't good, but >> the data obviously isn't valuable, and just loosing track of an entire >> database worth of data seems worse. >> > > So, based on that response seems not loosing track to the files associated > with the database is design choice we wish to achieve. Hence catalog having > entry but data directory being deleted is fine behavior to have and doesn't > need to be solved. > > > 2. Alternative way to make it safer is perform Checkpoint (step 5) just > >> > before dropping database buffers, to avoid the unsafe nature. Caveats of >> > this solution is: >> > - Performs IO for data which in success case anyways will get deleted >> > - Still doesn't cover the case where catalog has the database entry but >> > files are removed from disk >> >> That seems like an unacceptable slowdown. >> > > Given dropping database should be infrequent operation and only addition > IO cost is for buffers for that database itself as Checkpoint is anyways > performed in later step, is it really unacceptable slowdown, compared to > safety it brings ? > > >> >> > 3. One more fancier approach is to use pending delete mechanism used by >> > relation drops, to perform these non-catalog related activities at >> commit. >> > Easily, the pending delete structure can be added boolean to convey >> > database directory dropping instead of file. Given drop database can't >> be >> > performed inside transaction, not needed to be done this way, but this >> > makes it one consistent approach used to deal with on-disk removal. >> >> ISTM we'd need to do something like this. >> > > Given the above design choice to retain link to database files till > actually deleted, not seeing why pending delete approach any better than > approach 1. This approach will allow us to track the database oid in commit > transaction xlog record but any checkpoint post the same still looses the > reference to the database. Which is same case in approach 1 where separate > xlog record XLOG_DBASE_DROP is written just after committing the > transaction. > When we proposed approach 3, we thought its functionally same as approach > 1 just differs in implementation. But your preference to this approach and > stating approach 1 as bad, reads as pending deletes approach is > functionally different, we would like to hear more how? > > Considering the design choice we must meet, seems approach 2, moving > Checkpoint from step 5 before step 3 would give us the safety desired and > retain the desired link to the database till we actually delete the files > for it. > Awaiting response on this thread, highly appreciate the inputs.