>From the Department of Redundancy Department:

Attached is a perl script called 'pguniqchk'.  It checks the uniqueness
of unique constraints on tables in a PostgreSQL database using the
PG_TABLES and PG_INDEXES system "tables".

Why would this be useful?

If you're planning to dump and restore the database, this might be a
good sanity check to run before doing it.

Apparently, when such an impossible event occurs, the unique index on
the table only "sees" one of the duplicate rows.  In order to even query
both rows, one must run this SQL command (via psql) to turn off index
scans:

  => set enable_indexscan = off;

The attached script does this, then verifies the uniqueness of the
unique index by scanning the entire table.

It is probably useless for 99.999% of PostgreSQL users, but I thought
I'd share it just in case someone finds it useful, even if only
as a simple example of querying system tables.

How I found the problem:

I had a need to alter the data types of a column on two different tables
(VARCHAR(32) -> VARCHAR(128) and VARCHAR(128) -> TEXT) and drop a column
from another table.  The only way to do this in v7.1.x is to perform a
full dump and then restore.  When I tried to reload the data, I got
unique key violation errors, and data for two other tables did not load.

As it turns out, one table had a single pair of duplicate keys while the
other table had five pair of duplicates and one set of triplicates.

The incident happened around April 05, 2002 (from what I can tell of
the duplicated data), but hasn't happened since.  I was having SCSI
disk errors around that time on my production server, which is the prime
suspect.

NOTES:

- Only tested on PostgreSQL 7.1.3.

- When a UNIQUE INDEX is put on a NULLABLE column, duplicates with NULL
  values are possible.  This is a feature, though the script does not
  check for this case (so don't be alarmed if it finds something).

  7.4. Unique Indexes
  http://www.postgresql.org/idocs/index.php?indexes-unique.html

Dave
#!/usr/bin/perl
#
# pguniqchk.pl - Checks uniqueness of unique keys in tables of a
#   PostgreSQL database
#
# Copyright 2002 David D. Kilzer.  All rights reserved.
#
# This program is licensed under the same terms as Perl itself.
#

use strict;
use warnings;

use vars qw( $dbh );    # database handle

use DBI;
use Getopt::Long;


my $VERSION = sprintf("%d.%02d", q$Revision: 1.3 $ =~ /(\d+)\.(\d+)/);

my $HELP = <<HELP;
pguniqchk [options] database dbuser dbpass [tables, ...]
  (--help)                 this help message
  (--host=host|-h host)    PostgreSQL database host
  (--port=port|-p port)    PostgreSQL database port
  (--verbose)              make output more verbose
  (--version|-v)           display version information
HELP

my $opt_help;           # --help
my $opt_host;           # --host|-h
my $opt_port;           # --port|-p
my $opt_verbose;        # --verbose
my $opt_version;        # --version|-v

my $ret_val;

my $db_dsn;
my $db_user;
my $db_pass;
my @db_tables;


#
# Check command-line switches
#

$ret_val = GetOptions(
                'help'        => \$opt_help,
                'host|h=s'    => \$opt_host,
                'port|p=s'    => \$opt_port,
                'verbose'     => \$opt_verbose,
                'version|v'   => \$opt_version,
           );

if (   ! $ret_val
    || $opt_help
    || (scalar(@ARGV) < 3 && ! $opt_version)
   )
{
    print STDERR $HELP;
    exit (defined $opt_help ? 0 : 1);
}

if ($opt_version)
{
    print STDERR "pguniqchk v$VERSION\n";
    exit 0;
}

$db_dsn  = 'dbi:Pg:dbname=' . shift @ARGV;
$db_user = shift @ARGV;
$db_pass = shift @ARGV;

if (scalar(@ARGV) > 0)
{
    @db_tables = @ARGV;
    @ARGV = ();
}


#
# Handle various command-line arguments
#

if ($opt_host)
{
    $db_dsn .= ';host=' . $opt_host;
}

if ($opt_port)
{
    $db_dsn .= ';port=' . $opt_port;
}


#
# Connect to database
#

eval
{
    $::dbh = DBI->connect($db_dsn, $db_user, $db_pass,
        +{
            RaiseError => 1,
         },
    );
};

if ($@)
{
    die "Error connecting to database: $@";
}


#
# Grab list of tables to check
#

if (scalar(@db_tables) < 1)
{

    eval
    {
        local $::dbh->{RaiseError} = 1 if (! $::dbh->{RaiseError});

        my $i = 0;      # counter
        my $sth;        # statement handle

        $sth = $::dbh->prepare(<<SQL);
  SELECT tablename
    FROM pg_tables
   WHERE tableowner = ?
ORDER BY tablename
SQL

        $sth->bind_param(++$i, $db_user);

        $sth->execute();

        while (my $r = $sth->fetchrow_arrayref())
        {
            push(@db_tables, $r->[0]);
        }

        $sth->finish();
    };

    if ($@)
    {
        $::dbh->disconnect();
        die "Error querying list of tables from database: $@";
    }

}


# 
# Check uniqueness of unique indices on each table
#

eval
{
    local $::dbh->{RaiseError} = 1 if (! $::dbh->{RaiseError});

    my $found_dups = 0;

    # Turn off index scans so we may check for duplicate keys
    print "### Disabling index scans ... " if ($opt_verbose);
    $::dbh->do(qq{ set ENABLE_INDEXSCAN = off });
    print "done\n" if ($opt_verbose);

    foreach my $tab (@db_tables)
    {
        my $i = 0;      # counter
        my $sth;        # statement handle

        $sth = $::dbh->prepare(<<SQL);
  SELECT tablename
        ,indexname
        ,indexdef
    FROM pg_indexes
   WHERE tablename = ?
ORDER BY indexname
SQL

        $sth->bind_param(++$i, $tab);

        $sth->execute();

        while (my $r = $sth->fetchrow_arrayref())
        {
            my $constraint = $r->[1];
            my $index_def = $r->[2];
            $index_def =~ m/\(([^)]+)\)/;
            my @cols = map( ${[ split(' ', $_) ]}[0], @{[ split(',', $1) ]} );
            my $col = join(', ', @cols);

            # $index_def above looks like this (on one line):
            #   CREATE UNIQUE INDEX foo_bar_pkey ON foo_bar
            #          USING btree (foo_id int4_ops, bar_id int4_ops)
            # We then parse out the list of columns in parenthesis into
            #   @cols like this:  @cols = qw(foo_id driver_id);
            # Then we join them into a comma-separated string, $col, for
            #   use with the SQL and status output like this:
            #   $cols = 'foo_id, driver_id';

            if ($index_def =~ m/CREATE\s+UNIQUE\s+INDEX/i)
            {

                my $sql = <<SQL;
  SELECT *
    FROM (  SELECT $col
                  ,COUNT(*) AS count
              FROM $tab
          GROUP BY $col
         ) AS sub
   WHERE count > 1
SQL
                my $sth2 = $::dbh->prepare($sql);

                $sth2->execute();

                if ($sth2->fetchrow_arrayref())
                {
                    my $printable_sql = $sql;
                    $printable_sql =~ s/[\s\n]+/ /mg;
                    $printable_sql =~ s/^\s+//;
                    $printable_sql =~ s/\s+$//;

                    print "*** '$constraint' constraint on ($col)\n"
                        . "    in '$tab' table is NOT UNIQUE.\n";
                    print "    $printable_sql\n";

                    while ($sth2->fetchrow_arrayref()) { ; }

                    $found_dups++;
                }
                else
                {
                    print "--- '$constraint' constraint on ($col)\n"
                        . "    in '$tab' table is unique.\n"
                        if ($opt_verbose);
                }

                $sth2->finish();
            }
            else
            {
                print "xxx '$constraint' is not a unique constraint\n"
                    . "    on '$tab' table: skipped.\n"
                    if ($opt_verbose);
            }
        }

        $sth->finish();
    }

    if ($found_dups)
    {
        print "*** Don't forget to run this SQL before querying the database:\n";
        print "    set enable_indexscan = off;\n";
    }

    # Turn on index scans again
    print "### Enabling index scans ... " if ($opt_verbose);
    $::dbh->do(qq{ set ENABLE_INDEXSCAN = on });
    print "done\n" if ($opt_verbose);
};

if ($@)
{
    $::dbh->do(qq{ set ENABLE_INDEXSCAN = on });
    $::dbh->disconnect();
    die "Error checking uniqueness of unique keys in database: $@";
}

exit 0;

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to