Review: Needs Fixing

Added a few comments -- my biggest issue is that the script is VACUUMing the 
database, and locking it to do so. I'm generally of the opinion that VACUUM is 
misunderstood and over-utilized, but here the ultimate purpose of this script 
is to backup the database. In this case, VACUUM INTO would be *vastly* 
preferable as it'd require no locks at all and would still result in a 
minimally sized backup image (even though the original db would still be 
oversized, but this is a non-issue).

Anyway, I've left some more extensive commentary in the diff.

Diff comments:

> diff --git a/charms/focal/autopkgtest-web/webcontrol/db-backup 
> b/charms/focal/autopkgtest-web/webcontrol/db-backup
> new file mode 100755
> index 0000000..e636334
> --- /dev/null
> +++ b/charms/focal/autopkgtest-web/webcontrol/db-backup
> @@ -0,0 +1,262 @@
> +#!/usr/bin/python3
> +"""
> +This script periodically backs up the sqlite3 db to swift storage
> +and clears up old backups
> +"""
> +
> +import atexit
> +import configparser
> +import datetime
> +import gzip
> +import logging
> +import os
> +import shutil
> +import sqlite3
> +import sys
> +import time
> +
> +import swiftclient
> +from helpers.utils import init_db
> +
> +DB_PATH = ""
> +DB_COPY_LOCATION = ""
> +DB_NAME = ""
> +CONTAINER_NAME = "db-backups"
> +MAX_DAYS = 7
> +SQLITE_DUMP_CMD = 'sqlite3 /home/ubuntu/autopkgtest.db ".backup 
> autopkgtest.db.bak"'
> +DB_BACKUP_NAME = ""
> +DB_BACKUP_PATH = ""
> +
> +
> +def db_connect() -> sqlite3.Connection:
> +    """
> +    Establish connection to sqlite3 db
> +    """
> +    global DB_PATH
> +    global DB_NAME
> +    global DB_BACKUP_NAME
> +    global DB_BACKUP_PATH
> +    cp = configparser.ConfigParser()
> +    cp.read(os.path.expanduser("~ubuntu/autopkgtest-cloud.conf"))
> +    DB_PATH = cp["web"]["database"]
> +    DB_NAME = DB_PATH.split("/")[-1]
> +    DB_BACKUP_NAME = "%s.bak" % DB_NAME
> +    DB_BACKUP_PATH = "/tmp/%s" % DB_BACKUP_NAME

from pathlib import Path
#...
DB_PATH = Path(cp['web']['database'])
DB_BACKUP_PATH = Path('/tmp') / (DB_PATH.name + '.bak')

In all seriousness, pathlib is well worth a look; one of the best things to 
happen to the stdlib in a loooong time.

> +
> +    db_con = init_db(cp["web"]["database"])
> +
> +    return db_con
> +
> +
> +def is_db_locked(db_con: sqlite3.Connection) -> bool:
> +    """
> +    Check if sqlite3 db is locked, if not, continue with rest of script
> +    """
> +    c = db_con.cursor()
> +    try:
> +        c.execute("BEGIN IMMEDIATE")
> +    except sqlite3.OperationalError as e:
> +        if "database is locked" in str(e):
> +            logging.info(
> +                "Database is locked, full error: %s\nsleeping..." % str(e)
> +            )
> +            time.sleep(5)

This is a slightly weird call; I wouldn't expect a function called 
"is_db_locked" to pause in its implementation, just to immediately return 
False/True and for the caller to handle any wait-state required. In other 
words, the while..pass down in the main body to have the sleep rather than this 
call.

Actually, I'd much rather have a function that just handles acquiring a lock on 
the database, returning when it had done so (or raising an exception if it 
times out). But I wouldn't worry about fixing this, because I'm also going to 
question the need for a lock at all here ...

> +            return True
> +        elif "database disk image is malformed" in str(e):
> +            logging.info(
> +                "Database is corrupted! Exiting! Full error: %s" % str(e)
> +            )
> +            sys.exit(1)

Yikes! No, don't go parsing exception messages which are subject to translation 
and change to determine serious things like database corruption. Worse still, 
database corruption doesn't appear as an OperationalError, it's a DatabaseError 
(https://github.com/python/cpython/blob/fbb016973149d983d30351bdd1aaf00df285c776/Modules/_sqlite/util.c#L60).
 Regardless, don't do this.

Unfortunately, Python's rather weak DB-API provides no good mechanism for 
determining the precise error of a given engine (some interfaces return the 
error as a value on the exception, but sqlite3 doesn't). However, I wouldn't 
worry overly about this because (as mentioned) I don't think a lock is 
necessary at all here.

> +        else:
> +            logging.info("Locking db failed: %s" % str(e))
> +            time.sleep(5)
> +            return True
> +    c.execute("END")
> +    return False
> +
> +
> +def backup_db(db_con: sqlite3.Connection):
> +    global DB_BACKUP_PATH
> +    db_backup_con = sqlite3.connect(DB_BACKUP_PATH)
> +    with db_backup_con:
> +        db_con.backup(db_backup_con, pages=1)
> +    db_backup_con.close()
> +
> +
> +def db_checkpoint(db_con: sqlite3.Connection):
> +    c = db_con.cursor()
> +    try:
> +        c.execute("checkpoint")
> +    except sqlite3.OperationalError as e:
> +        logging.info("checkpoint failed: %s" % str(e))
> +        sys.exit(1)
> +
> +
> +def db_vacuum(db_con: sqlite3.Connection):
> +    c = db_con.cursor()
> +    logging.info("vacuum'ing db")
> +    try:
> +        c.execute("vacuum")
> +    except sqlite3.OperationalError as e:
> +        logging.info("vacuum failed: %s" % str(e))
> +        sys.exit(1)
> +
> +
> +def copy_db():
> +    """
> +    Copy database to /tmp
> +    """
> +    global DB_PATH
> +    global DB_COPY_LOCATION
> +    db_name = DB_PATH.split("/")[-1]
> +    DB_COPY_LOCATION = "/tmp/%s" % db_name
> +    shutil.copyfile(DB_PATH, DB_COPY_LOCATION)
> +
> +
> +def compress_db():
> +    """
> +    use gzip to compress database
> +    """
> +    global DB_BACKUP_PATH
> +    with open(DB_BACKUP_PATH, "rb") as f_in, gzip.open(
> +        "%s.gz" % DB_BACKUP_PATH, "wb"
> +    ) as f_out:
> +        f_out.writelines(f_in)
> +
> +
> +def init_swift_con() -> swiftclient.Connection:
> +    """
> +    Establish connection to swift storage
> +    """
> +    swift_creds = {
> +        "authurl": os.environ["OS_AUTH_URL"],
> +        "user": os.environ["OS_USERNAME"],
> +        "key": os.environ["OS_PASSWORD"],
> +        "os_options": {
> +            "region_name": os.environ["OS_REGION_NAME"],
> +            "project_domain_name": os.environ["OS_PROJECT_DOMAIN_NAME"],
> +            "project_name": os.environ["OS_PROJECT_NAME"],
> +            "user_domain_name": os.environ["OS_USER_DOMAIN_NAME"],
> +        },
> +        "auth_version": 3,
> +    }
> +    swift_conn = swiftclient.Connection(**swift_creds)
> +    return swift_conn
> +
> +
> +def create_container_if_it_doesnt_exist(swift_conn: swiftclient.Connection):
> +    """
> +    create db-backups container if it doesn't already exist
> +    """
> +    global CONTAINER_NAME
> +    try:
> +        swift_conn.get_container(CONTAINER_NAME)
> +    except swiftclient.exceptions.ClientException:
> +        swift_conn.put_container(
> +            CONTAINER_NAME,
> +        )
> +
> +
> +def upload_backup_to_db(
> +    swift_conn: swiftclient.Connection,
> +) -> swiftclient.Connection:
> +    global DB_PATH
> +    global DB_BACKUP_PATH
> +    global CONTAINER_NAME
> +    """
> +    Upload compressed database to swift storage under container db-backups
> +    """
> +    now = datetime.datetime.now().strftime("%Y/%m/%d/%H_%M_%S")
> +    object_path = "%s/%s" % (now, DB_PATH.split("/")[-1] + ".gz")
> +    for _ in range(5):
> +        try:
> +            swift_conn.put_object(
> +                CONTAINER_NAME,
> +                object_path,
> +                "%s.gz" % DB_BACKUP_PATH,
> +                content_type="text/plain; charset=UTF-8",
> +                headers={"Content-Encoding": "gzip"},
> +            )
> +            break
> +        except swiftclient.exceptions.ClientException as e:
> +            logging.info("exception: %s" % str(e))
> +            swift_conn = init_swift_con()
> +    return swift_conn
> +
> +
> +def delete_old_backups(
> +    swift_conn: swiftclient.Connection,
> +) -> swiftclient.Connection:
> +    global CONTAINER_NAME
> +    global MAX_DAYS
> +    """
> +    Delete objects in db-backups container that are older than 7 days
> +    """
> +    logging.info("Removing old db backups...")
> +    _, objects = swift_conn.get_container(CONTAINER_NAME)
> +    now = datetime.datetime.now()
> +
> +    for obj in objects:
> +        last_modified = obj["last_modified"].split(".")[0]
> +        timestamp = datetime.datetime.strptime(
> +            last_modified, "%Y-%m-%dT%H:%M:%S"
> +        )
> +        diff = now - timestamp
> +        if diff > datetime.timedelta(days=MAX_DAYS):
> +            logging.info("Deleting %s" % obj["name"])
> +            for _ in range(5):
> +                try:
> +                    swift_conn.delete_object(CONTAINER_NAME, obj["name"])
> +                    break
> +                except swiftclient.exceptions.ClientException as _:
> +                    swift_conn = init_swift_con()
> +    return swift_conn
> +
> +
> +def cleanup():
> +    """
> +    Delete db and compressed db under /tmp
> +    """
> +    global DB_COPY_LOCATION
> +    if os.path.isfile(DB_COPY_LOCATION):
> +        os.remove(DB_COPY_LOCATION)
> +    if os.path.isfile("%s.gz" % DB_COPY_LOCATION):
> +        os.remove("%s.gz" % DB_COPY_LOCATION)
> +
> +
> +if __name__ == "__main__":
> +    # Only bother running script if this unit is the leader.
> +    logging.basicConfig(level="INFO")
> +    if not os.path.isfile("/run/autopkgtest-web-is-leader"):
> +        logging.info("unit is not leader, exiting...")
> +        sys.exit(0)
> +    # connect to db
> +    logging.info("Connecting to db")
> +    db_con = db_connect()
> +    # check to see if database is locked
> +    logging.info("Checking to see if db is locked...")
> +    while is_db_locked(db_con):
> +        pass
> +    # vacuum the db
> +    db_vacuum(db_con)
> +    # backup the db
> +    backup_db(db_con)

Okay ... this is the weird bit. The purpose of this script is to backup the 
database, for which we don't need a lock. The only reason we need a lock is for 
the VACUUM command, and generally I'm *extremely* sceptical of arbitrarily 
running VACUUM without good reason.

No database engine grows its database infinitely; you're much better off *not* 
vacuuming it even if it leaves a bunch of unused pages, because they'll be 
periodically reused anyway (as the db expands and shrinks), and you'll avoid 
constantly asking the underlying FS to deallocate and reallocate pages for the 
db this way. Yes, your database will be slightly larger than it otherwise would 
be, but unless you have a truly pathological case this will always be some 
small percentage of your overall database size and you shouldn't worry about it.

However, it's doubly weird here because of the VACUUM INTO command 
(https://www.sqlite.org/lang_vacuum.html). If you *really* want a minimally 
sized backup, just use VACUUM INTO. Because this relies on the backup API it 
requires no exclusive/write lock, and it will create a minimally sized backup 
by vacuuming into the *copy* of the database. Yes, the original will remain 
over-sized, but as mentioned this is a non-issue.

> +    # compress it
> +    logging.info("Compressing db")
> +    compress_db()
> +    # register cleanup if anything fails
> +    logging.info("Registering cleanup function")
> +    atexit.register(cleanup)
> +    # initialise swift conn
> +    logging.info("Setting up swift connection")
> +    swift_conn = init_swift_con()
> +    # create container if it doesn't exist
> +    create_container_if_it_doesnt_exist(swift_conn)
> +    # upload to swift container
> +    logging.info("Uploading db to swift!")
> +    swift_conn = upload_backup_to_db(swift_conn)
> +    # Remove old results
> +    logging.info("Pruning old database backups")
> +    swift_conn = delete_old_backups(swift_conn)
> +    # run cleanup
> +    cleanup()


-- 
https://code.launchpad.net/~andersson123/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/460043
Your team Canonical's Ubuntu QA is requested to review the proposed merge of 
~andersson123/autopkgtest-cloud:sqlite-db-backup into autopkgtest-cloud:master.


-- 
Mailing list: https://launchpad.net/~canonical-ubuntu-qa
Post to     : canonical-ubuntu-qa@lists.launchpad.net
Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa
More help   : https://help.launchpad.net/ListHelp

Reply via email to