[PERFORM] Suggestions for a data-warehouse migration routine
I've ported enough of my companies database to Postgres to make warehousing on PG a real possibility. I thought I would toss my data migration architecture ideas out for the list to shoot apart.. 1. Script on production server dumps the production database (MSSQL) to a set of delimited text files. 2. Script on production server moves files via FTP to a Postgres database server. 3. File Alteration Monitor trigger on PG server executes script when last file is transferred. 4. Script on PG server drops the target database (by issuing a "dropdb" command). 5. Script on PG server re-creates target database. (createdb command) 6. Script on PG server re-creates the tables. 7. Script on PG server issues COPY commands to import data. 8. Script on PG server indexes tables. 9. Script on PG server builds de-normalized reporting tables. 10. Script on PG server indexes the reporting tables. 11. Script on PG server creates needed reporting functions. 12. Vacuum analyze? My question revolves around the drop/create for the database. Is their significant downside to this approach? I'm taking this approach because it is simpler from a scripting point of view to simply start from scratch on each warehouse update. If I do not drop the database I would need to delete the contents of each table and drop all indexes prior to the COPY/data import. My assumption is all the table deletes and index drops would be more expensive then just droping/re-creating the entire database. Also, is the Vacuum analyze step needed on a freshly minted database where the indexes have all been newly created? Thanks in advance for all feedback. -- ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Whence the Opterons?
I will second the nod to Penguin computing. We have a bit of Penguin hardware here (though the majority is Dell). We did have issues with one machine a couple of years ago, but Penguin was very pro-active in addressing that. We recently picked up a Dual Opteron system from them and have been very pleased with it so far. I would be careful of the RHES that it ships with though. We had machine lockups immediately after the suggested kernel update (had to down grade manually). Also, the RH supplied Postgres binary has issues, so you would need to compile Postgres yourself until the next RH update. On Fri, 2005-05-06 at 14:39 -0700, Mischa Sandberg wrote: > After reading the comparisons between Opteron and Xeon processors for Linux, > I'd like to add an Opteron box to our stable of Dells and Sparcs, for > comparison. > > IBM, Sun and HP have their fairly pricey Opteron systems. > The IT people are not swell about unsupported purchases off ebay. > Anyone care to suggest any other vendors/distributors? > Looking for names with national support, so that we can recommend as much to > our > customers. > > Many thanks in advance. -- -- Richard Rowell [EMAIL PROTECTED] Bowman Systems (318) 213-8780 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Strange behaviour
We are using a BI tool that generates some rather ugly queries. One of the ugly queries is taking much longer to return thin I think it should. The select expression when run alone returns in 2 seconds with 35k rows (http://www.bowmansystems.com/~richard/explain_select.analyze) The "where clause" when run alone returns 5200 rows in 10 seconds (http://www.bowmansystems.com/~richard/explain_where.analyze) However when I put to two together it takes much, much longer to run. (http://www.bowmansystems.com/~richard/full.analyze) Can anyone shed any light on what is going on here? Why does the optimizer choose such a slow plan in the combined query when the only real difference between the full query and the "where only" query is the number of rows in the result set on the "outside" of the "IN" clause? A few pertinent observations/facts below 1. The query is generated by a BI tool, I know it is ugly and stupid in many cases. However, please try to see the larger issue, that if the select and where portions are run separately they are both fast but together it is insanely slow. 2. The database has vacuumdb -f -z run on it nightly. 3. Modifications to the stock postgresql.conf: shared_buffers = 15000 work_mem = 131072 default_statistics_target = 100 4. Dual Dual core Opterons, 4 gigs of ram, 6 disk Serial ATA hardware RAID 10 running Postgres 8.03 compiled from source running on Debian stable. 5. The tables being queried are only 200 megs or so combined on disk, the whole DB is ~ 4 gigs SELECT sum(relpages*8/1024) AS size_M FROM pg_class; size_m 4178 Thanks! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Improve performance of query
I'm trying to port our application from MS-SQL to Postgres. We have implemented all of our rather complicated application security in the database. The query that follows takes a half of a second or less on MS-SQL server and around 5 seconds on Postgres. My concern is that this data set is rather "small" by our applications standards. It is not unusual for the da_answer table to have 2-4 million records. I'm worried that if this very small data set is taking 5 seconds, then a "regular sized" data set will take far too long. I originally thought the NOT EXISTS on the "da_data_restrict_except_open" table was killing performance, but the query took the exact same amount of time after I deleted all rows from this table. Note that the hard-coded 9.0, and 4000 parameters, as well as the parameter to svp_getparentproviders are the three variables that change from one run of this query to the next. I'm using Postgres 7.4.5 as packaged in Debian. shared_buffers is set to 57344 and sort_mem=4096. The machine has an AMD 1.8+ and ` gig of RAM. Here are some relevant performance statistics: richard:/usr/share/cups/model# cat /proc/sys/kernel/shmmax 536870912 richard:/usr/share/cups/model# cat /proc/sys/kernel/shmall 536870912 richard:/home/richard# hdparm -tT /dev/hda Timing cached reads: 1112 MB in 2.00 seconds = 556.00 MB/sec Timing buffered disk reads: 176 MB in 3.02 seconds = 58.28 MB/sec I have included an EXPLAIN ANALYZE, relevant table counts, and relevant indexing information. If anyone has any suggestions on how to improve performance TIA! SELECT tab.answer_id, client_id, question_id, recordset_id, date_effective, virt_field_name FROM ( SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id, date_effective FROM da_answer a WHERE a.date_effective <= 99.0 AND a.inactive != 1 AND ( 5000 = 4000 OR (EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE svp_getparentproviderids = a.provider_id)) ) UNION SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id, date_effective FROM da_answer a, ( SELECT main_id FROM da_data_restrict WHERE type_id = 2 AND (provider_id IN (SELECT * FROM svp_getparentproviderids(1))) UNION SELECT sa.uid AS main_id FROM da_answer sa JOIN da_data_restrict_except_closed dr ON dr.main_id = sa.uid AND dr.type_id = 2 AND dr.except_provider_id = 1 WHERE (restricted = 1) AND (restricted_closed_except = 1) AND sa.covered_by_roi = 1 UNION SELECT sa.uid AS main_id FROM da_answer sa WHERE (restricted = 0) AND (restricted_open_except = 1) AND (NOT EXISTS (SELECT dr.main_id FROM da_data_restrict_except_open dr WHERE (dr.main_id = sa.uid) AND (dr.type_id = 2) AND (dr.except_provider_id in (select * from svp_getparentproviderids(1) AND sa.covered_by_roi = 1 UNION SELECT sa.uid AS main_id FROM da_answer sa WHERE (restricted = 0) AND (restricted_open_except = 0) AND sa.covered_by_roi = 1 ) sec WHERE a.covered_by_roi = 1 AND a.date_effective <= 99.0 AND a.inactive != 1 AND a.uid = sec.main_id AND 5000 > 4000 ) tab, da_question q WHERE tab.question_id = q.uid AND (min_access_level <= 4000 OR min_access_level IS NULL) Table counts from relevant tables da_question 1095 da_answer 21117 da_question 1095 da_data_restrict_except_closed 3087 da_data_restrict_except_open13391 svp_getparentproviderids(1) 1 Relevant Index create index in_da_data_restrict_provider_id on da_data_restrict(provider_id); create index in_da_data_restrict_main_id on da_data_restrict(main_id); create index in_da_data_restrict_type_id on da_data_restrict(type_id); create index in_da_data_restrict_client_id on da_data_restrict(client_id); create index in_da_dr_type_provider on da_data_restrict(type_id,provider_id); create index in_da_data_rec_provider_id ON da_data_restrict_except_closed(provider_id); create index in_da_data_rec_type_id ON da_data_restrict_except_closed(type_id); create index in_da_data_rec_main_id ON da_data_restrict_except_closed(main_id); create index in_da_data_rec_except_provider_id ON da_data_restrict_except_closed(except_provider_id); create index in_da_data_reo_provider_id ON da_data_restrict_except_open(provider_id); create index in_da_data_reo_type_id ON da_data_restrict_except_open(type_id); create index in_da_data_reo_main_id ON da_data_restrict_except_open(main_id); create index in_da_data_reo_except_provider_id ON da_data_restrict_except_open(except_provider_id); create index in_da_answer_client_id ON