On Converting from MySQL 5.0.x to PostgreSQL 8.1.x
--------------------------------------------------

I am in the process of converting a couple of major sites from MySQL 5.0.22
to PostgreSQL 8.1.4, and I thought I would share some of my observations on
this process and the two database systems in general.  I feel I am in a good
position to do this, as I am fairly familiar in-depth with MySQL's modern
features, am relatively technology-agnostic, and having done this since
the MySQL project has matured substantially over the last couple of years, I
feel I can give insight into the features of both systems from a current
perspective.

I will attempt to do this from an objective standpoint, as I still see
benefits and drawbacks to both systems (no software will ever meet the needs
of every situation).  If you are looking for a document that mocks one system
or another, this is not it; there are countless results for "X sucks" on
$your_favorite_search_engine.

My reasons for making this switch are primarily due to having a bit of spare
time, wanting to expand my horizons and familiarize myself with another
well-respected open source project, some of the nice in-built procedural
language features of PostgreSQL, and basically wanting to form my own
opinion of the features of both systems.  That being said, I have really
been impressed so far with the features I am discovering, and becoming
happier each day.

It is by no means comprehensive; I have just highlighted some of the
more notable and obvious differences as I discovered them.  Most of the
information in here is pretty basic for those of you very familiar with
both systems.  I also only highlight the differences in the Unix/POSIX
versions of these programs.  (I don't run Windows as a server, and I don't
care to.  My technology agnosticism stops at inadequate systems.)

All comments are current as of the time of writing (Spring/Summer 2006).
Error corrections are very welcome.  Flames can go to /dev/null.  Don't care.
Oh, and since I wrote this in vim, it is best read with a monospace font. :^)

Major differences I have noted:
-------------------------------

MySQL 5.0.x:
* Multiple storage engines with different features.
* Supports multi-insert syntax (INSERT INTO foo VALUES (1,2), (3,4) ...)
* A few more access controls on features built-in to the GRANT tables.  Many
 of these are still present, but implemented in other ways in PostgreSQL.
* Single AUTO_INCREMENT column allowed per table.
* Easy, built-in and extensive replication support.
* Single datastore location per server.
* ALL Stored Procedures are kept in the mysql system database.

PostgreSQL 8.1.x:
* Embedded procedures in multiple native languages (stored procedures and
 functions can be written in native Perl, Python, Tcl, SQL, PL/PgSQL)
* Extensive and versatile procedural language functionality.
* User-definable data types and operators.
* Multiple sequence generators allowed per table.
* Replication support still rudimentary.
* Stored procedures are kept (somewhat more logically, imho) in the
 corresponding databases.
* Multiple datastore locations possible using tablespaces concept.
 (For the record, MySQL will have tablespaces when 5.1.x is stabilized.)
* Most system variables, "built-in" types and features configurable as they
 are just kept in a system catalog.
* Allows deletions and subselects to specify the same table (e.g. DELETE
 FROM foo WHERE id IN (SELECT id FROM foo WHERE...) ).  MySQL does not
 allow this as of 5.0.22.
* Copious documentation on the database internals, for extending the
 database itself.

Pointers, tips, quick facts and gotchas for other people converting:
--------------------------------------------------------------------

* Don't bother using an old version, just go for 8.1.4 (or whatever is new
 at the time of your conversion.  This should be common sense.)
* Since Pg uses a full transactional storage engine, the speed is roughly
 comparable to InnoDB, rather than the stock MyISAM format.
* PostgreSQL's TCP port is 5432 by default.
* The main server process on PostgreSQL is 'postmaster'.
* 'postmaster' can be controlled via the 'pg_ctl' command.
* The administrative user is called 'postgres' by default.
* Like MySQL, Pg uses the system user as default, if no username is
 specified when connecting.
* The command-line client is called 'psql'.
* PostgreSQL by default comes configured to disallow network connections.
 To enable these, you must follow these steps:
   1.  Edit $DATADIR/pg_hba.conf and add access permissions.
   2.  Edit $DATADIR/postgresql.conf and uncomment the listen_addresses
       line, setting it to something reasonable.
   3.  Restart postmaster.
* PostgreSQL relies extensively upon quick aliases for common features within
 the CLI shell.  MySQL offers many similar features, but they aren't used
 as much from what I have observed.
* MySQL combines the concepts of 'database' and 'schema' into one.  PostgreSQL
 differentiates the two.  While the hierarchy in MySQL is
 database.table.field, PostgreSQL is roughly: database.schema.table.field.
 A schema is a 'logically grouped set of tables but still kept within a
 particular database.'  This could allow separate applications to be built
 that still rely upon the same database, but can be kept somewhat logically
 separated.  The default schema in each database is called 'public', and is
 the one referred to if no others are specified.  This can be modified with
 'SET search_path TO ...'.
* Pg uses a 'template1' pseudo-database that can be tailored to provide
 default objects for new database creation, if you should desire.  It
 obviously also offers a 'template0' database that is read-only and
 offers a barebones database, more equivalent to the empty db created with
 mysql's CREATE DATABASE statement.
* Pg's ROLEs can specify a single user or a group, and be nested to contain
 multiple users.
* Pg's default character set (in 8.1.4) is UTF8.
* Pg uses the 'serial' column type instead of AUTO_INCREMENT.  This allows
 more than one independent sequence to be specified per table (though the
 utility of this may be of dubious value).  These are closer to Oracle's
 concept of sequence generators, and they can be manipulated with the
 currval(), nextval(), setval(), and lastval() functions.
* Pg requires its tables and databases be 'vacuumed' regularly to remove
 completed transaction snapshots and optimize the tables on disk.  It is
 necessary because the way that PostgreSQL implements true MVCC is by
 writing all temporary transactions to disk and setting a visibility
 flag for the record.  Vacuuming can be performed automatically, and in
 a deferred manner by using vacuum_cost settings to limit it to low-load
 periods or based upon numerous other criteria.  See the manual for more
 information.
* Kept internally in Pg, there is a concept called the OID, which is a
 continuously incremented number used to assign unique IDs to system
 objects.  This allows the database to store and refer uniquely to user
 operators, new databases, basically anything that the system needs to
 refer to in the 'data directory', regardless of user-defined names.
* Most administrative procedures will refuse to run as root, and require you
 to su to the 'postgres' system user to perform the action.
* PgAdminIII gives you a great overview of the hierarchy of system objects
 throughout the server.  Even though you may administrate your server
 primarily via the CLI, as I do, it is still valuable during the learning
 process to use this tool to browse around the various objects, to learn
 the system setup and hierarchy visually.
* While MySQL supports transactions with the InnoDB databases, many MySQL
 users generally do not use them extensively enough.  With Pg, due to the
 behaviour of the server in attempting to ensure data integrity in a
 variety of situations (client disconnection, network trouble, server
 crashes, etc.), it is highly advisable to become familiar and utilize
 transactions a lot more, to ensure your DATA is left in a consistent state
 before and after every change you wish to make.
* There is a conversion utility called 'mysql2pgsql' that will convert
 dump files from the mysqldump format, to a format that psql can
 understand.  It is available at:
 - http://gborg.postgresql.org/project/mysql2psql/projdisplay.php
* To turn on query time output, similar to the mysql CLI, use the '\timing'
 command from psql.  (Note that the time is displayed in milliseconds,
 whereas in the mysql client it is displayed in seconds.)

Common equivalents:
-------------------

MySQL                           PostgreSQL
-----                           -----------
mysql database                  system tables
mysql                           psql
mysqld                          postmaster
mysqladmin                      initdb/dropdb/createuser/dropuser/
                                 createlang/droplang/vacuumdb
mysqldump                       pg_dump/pg_dumpall/pg_restore
-nothing-                       ecpg
SHOW DATABASES;                 \l
SHOW GRANTS;                    \du
SHOW TABLES;                    \dt
DESC tblname;                   \d foo
USE dbname;                     \c dbname
ALTER TABLE foo AUTO_INCREMENT = n;     SELECT setval('seq_name',n);
SHOW PROCESSLIST;               SELECT * FROM pg_stat_activity;
OPTIMIZE TABLE ...              VACUUM ...

Final thoughts:
---------------
Overall, I have been happy and very impressed with the features offered by
PostgreSQL 8.1.4, and believe that I will be using it for the majority of my
future projects.  There are still some niches where I see utility for the
pluggable storage engines and raw speed of MySQL.  I will be keeping my eye
on that project closely, also, as I want to test out the new Scheduled
Events feature from the 5.1 series once it is stabilized.

I have also been using Apress' "Beginning Databases with PostgreSQL, 2nd
Edition" (2005) as my learning material.  I briefly examined Sams'
"PostgreSQL - The Comprehensive Guide, 2nd Ed." (2005) and Pearson's
"PostgreSQL: Introduction and Concepts" (2001), but found the Apress book
the best of the 3 that I had access to.  YMMV.  More info in the book
reviews linked below.

Further information:
--------------------
PostgreSQL manual: http://www.postgresql.org/docs/manuals/
PgAdmin3: http://www.pgadmin.org/
PostgreSQL book reviews:
 http://techdocs.postgresql.org/techdocs/bookreviews.php
Community Support Channel: irc.freenode.net in #postgresql
Varlena Consulting's General Bits archives:
 http://www.varlena.com/GeneralBits/

---------

Thank you, and I hope that these notes prove helpful to others!

Jason McManus

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to