Hi, I've been experimenting with a migration from MySQL to Postgres.
One problem I've come across is that there are a handful of duplicate files in the Filename table mysql> select count(*) as filecount, Filename.name from Filename GROUP BY Filename.Name ORDER BY filecount DESC LIMIT 30; +-----------+-----------------------------------+ | filecount | name | +-----------+-----------------------------------+ | 2 | 0 | | 2 | 1 | | 2 | 2 | | 2 | tt172.16.17.36-www_gmail_com.html | mysql> select * from Filename WHERE name IN ('0','1','2','tt172.16.17.36-www_gmail_com.html'); +------------+-----------------------------------+ | FilenameId | Name | +------------+-----------------------------------+ | 9247 | 0 | | 101380 | 0 | | 9248 | 1 | | 101381 | 1 | | 9249 | 2 | | 101382 | 2 | | 575752 | tt172.16.17.36-www_gmail_com.html | | 625369 | tt172.16.17.36-www_gmail_com.html | +------------+-----------------------------------+ and there doesn't appear to be any constraint preventing this. mysql> describe Filename; +------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+---------+----------------+ | FilenameId | int(10) unsigned | NO | PRI | NULL | auto_increment | | Name | blob | NO | MUL | NULL | | +------------+------------------+------+-----+---------+----------------+ 2 rows in set (0.00 sec) mysql> show create table Filename; +----------+----------------------------------------- | Filename | CREATE TABLE `Filename` ( `FilenameId` int(10) unsigned NOT NULL AUTO_INCREMENT, `Name` blob NOT NULL, PRIMARY KEY (`FilenameId`), KEY `Name` (`Name`(255)) ) ENGINE=MyISAM AUTO_INCREMENT=4277244 DEFAULT CHARSET=latin1 | +----------+----------------------------------------- However, the postgresql table does have this constraint and complains when I try to insert the data. bacula=# COPY Filename FROM '/var/tmp/bacula-backup/Filename.txt'; ERROR: duplicate key value violates unique constraint "filename_name_idx" CONTEXT: COPY filename, line 101380: "101380 0" I could of course prune four duplicate lines from the data before inserting, but I'm afraid of the possible effect on a future restore. It appears there are duplicate entries in the File database for each time there is a duplicate in the Filename table: mysql> SELECT * FROM File WHERE FilenameId IN (select FilenameId from Filename WHERE name IN ('0','1','2','tt172.16.17.36-www_gmail_com.html')) ; +-----------+-----------+-------+---------+------------+--------+-----------------------------------------------------------+------------------------+ | FileId | FileIndex | JobId | PathId | FilenameId | MarkId | LStat | MD5 | +-----------+-----------+-------+---------+------------+--------+-----------------------------------------------------------+------------------------+ | 245079464 | 207178 | 13471 | 3870 | 9249 | 0 | gB iwKO IGk B Po Po A Hg BAA I BNTy22 BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ | | 245079465 | 207178 | 13471 | 3870 | 101382 | 0 | gB iwKO IGk B Po Po A Hg BAA I BNTy22 BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ | | 215364675 | 173643 | 11277 | 3870 | 9249 | 0 | gB iwKO IGk B Po Po A Hg BAA I BM/DvL BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ | | 215364676 | 173643 | 11277 | 3870 | 101382 | 0 | gB iwKO IGk B Po Po A Hg BAA I BM/DvL BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ | | 259000887 | 290784 | 14315 | 745797 | 9248 | 0 | A A IH/ B A A A A A A BBg44b 22f3Y BBg44b A A M | UumkJoY8ZKJd6/HfnwvCDg | | 259000888 | 290784 | 14315 | 745797 | 101381 | 0 | A A IH/ B A A A A A A BBg44b 22f3Y BBg44b A A M | UumkJoY8ZKJd6/HfnwvCDg | | 258833500 | 123399 | 14315 | 807939 | 9248 | 0 | A A IH/ B A A A A A A BBamBf 22f3Y BBamBf A A M | G4vlZvkmC8DFxl8y0RsnfA | | 258833501 | 123399 | 14315 | 807939 | 101381 | 0 | A A IH/ B A A A A A A BBamBf 22f3Y BBamBf A A M | G4vlZvkmC8DFxl8y0RsnfA | | 244900411 | 28129 | 13471 | 1752 | 9249 | 0 | gB GgUm IGk B Po Po A DH BAA I BNTyw4 BJlpIE BJlpIE A A E | ohnd0rdHuoOlhnsfdNGaAw | | 244900412 | 28129 | 13471 | 1752 | 101382 | 0 | gB GgUm IGk B Po Po A DH BAA I BNTyw4 BJlpIE BJlpIE A A E | ohnd0rdHuoOlhnsfdNGaAw | | 224916087 | 41066 | 12104 | 1079177 | 101381 | 0 | gI wox IGk B A A A fO BAA I BNR25S BL3WHV BNRzS7 A A E | hjBeI+HIpmuRUL7FK42mFA | | 224916086 | 41066 | 12104 | 1079177 | 9248 | 0 | gI wox IGk B A A A fO BAA I BNR25S BL3WHV BNRzS7 A A E | hjBeI+HIpmuRUL7FK42mFA | | 224916081 | 41061 | 12104 | 1079177 | 101382 | 0 | gI wos IGk B A A A 90 BAA I BNR25S BL3WHV BNRzS7 A A E | RUHfzzaQ4uJdbkSROzHPjQ | | 224916080 | 41061 | 12104 | 1079177 | 9249 | 0 | gI wos IGk B A A A 90 BAA I BNR25S BL3WHV BNRzS7 A A E | RUHfzzaQ4uJdbkSROzHPjQ | | 224916075 | 41056 | 12104 | 1079176 | 101381 | 0 | gI won IGk B A A A IS BAA I BNR25S BL3WHT BNRzS7 A A E | SbEF5jwAkpQM0EeUwM2s9A | so perhaps this is safe enough. Does anyone know how these duplicates may have arisen and what the best way to proceed is? Gavin ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users