On Tue, Oct 12, 2004 at 11:01:08AM -0400, Jerry LeVan wrote: > Hi, > I am futzing around with Andrew Stuarts "Catchmail" program > that stores emails into a postgresql database. > > I want to avoid inserting the same email more than once... > (pieces of the email actually get emplaced into several > tables). > > Is the "Message-ID" header field a globally unique identifer?
Not a postgresql related issue, but, yes Message-ID: is, by definition, a globally unique identifier. If there are two messages with the same Message-ID then the sender is asserting that those two messages are identical. See RFC 2822 section 3.6.4. You will sometimes see a message generated without a Message-ID at all, but that will usually have had a Message-ID added by some MTA along the delivery route. If your MX doesn't add Message-IDs when missing then you may well see incoming email without Message-IDs (mostly spam). In practice there are varying levels of competence in implementation of Message-ID generation, so you'll very rarely see syntactically incorrect Message-IDs that may, in theory, clash. > I eventually want to have a cron job process my inbox and don't > want successive cron tasks to keep re-entering the same email :) I wouldn't try and use Message-ID as a primary key, though. Give yourself a serial field. I don't use Message-ID at all in my postgresql-based mailstore. Instead I use a maildir style spool directory for incoming mail and the processes that import those spooled messages into the mailstore use standard maildir techniques for locking the message on disk, writing it to the DB, moving it atomically from the new/ to the cur/ directory, then commiting the database write. I've pumped millions of emails through this in production with no problems. Cheers, Steve ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html