Re: [GENERAL] Geographic High-Availability/Replication
Markus Schiltknecht <[EMAIL PROTECTED]> wrote: > > Hi, > > Gregory Stark wrote: > > Only if your application is single-threaded. By single-threaded I don't > > refer > > to operating system threads but to the architecture. If you're processing a > > large batch file handling records one by one and waiting for each commit > > before proceeding then it's single threaded. If you have a hundred > > independent > > clients on separate connections doing separate things then each one of them > > could get 6tps. Which you have will depend on your application and your > > needs, > > it may not be something you can change. > > Correct. > > Plus, as in the implementation of Postgres-R, performance is *not* bound > to the slowest node. Instead, every node can process transactions at > it's own speed. Slower nodes might then have to queue transactions from > those until they catch up again. I'm curious as to how Postgres-R would handle a situation where the constant throughput exceeded the processing speed of one of the nodes. I can see your system working if it's just spike loads and the slow nodes can catch up during slow periods, but I'm wondering about the scenarios where an admin has underestimated the hardware requirements and one or more nodes is unable to keep up. Just musing, really. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL Diff ?
On 8/26/07, Kevin Kempter <[EMAIL PROTECTED]> wrote: > On Saturday 25 August 2007 21:10:19 Ron Johnson wrote: > > On 08/25/07 21:51, Kevin Kempter wrote: > > > Hi List; > > > > > > I have a very large table (52million rows) - I'm creating a copy of it to > > > rid it of 35G worth of dead space, then I'll do a sync, drop the original > > > table and rename table2. > > > > What is your definition of "dead space"? > > > > Bad rows, duplicate rows, old rows? Something else? > > deleted rows that should have been cleaned up with vacuum, problem is the > client let it go so long that now I cant get a vacuum to finish cause it > impacts the day2day operations too much. Long story, see my recent questions > on the performance list for more info. In your place I would do something like Slony-I does, when it replicates the tables. Create on insert/update/delete triggers on table1 which will log operations on table1 to some table1_log table. Then copy table1 to table2. Then replay table1_log on table2, then BEGIN;LOCK tablel1;finish replaying the lock;DROP table1; alter table rename...;commit; Or perhaps actually use Slony-I for the above steps? Should work quite nicely... Or perhaps use SkyTools for it (I've never used it)? Regarda, Dawid ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] FATAL: could not reattach to shared memory (Win32)
On 8/24/07, Tom Lane <[EMAIL PROTECTED]> wrote: > "Trevor Talbot" <[EMAIL PROTECTED]> writes: > > On 8/23/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: > >> Not that wild a guess, really :-) I'd say it's a very good possibility - > >> but I have no idea why it'd do that, since all backends load the same > >> DLLs at that stage. > > > Not a valid assumption; you can't rely on consistent VM space among > > multiple [non-cloned] processes without a serious amount of effort. > > I'm not sure if you have a specific technical meaning of "clone" in mind > here, but these processes are all executing the identical executable, > and taking care to map the shmem early in execution *before* they load > any DLLs. So it should work. Apparently, it *does* work for awhile for > the OP, and then stops working, which is even odder. "Clone" in the same sense as fork(): duplicating a process instead of regenerating it. Even ignoring things like DLL replacement and LD_PRELOAD-style options, there's still a lot of opportunity for dynamic behavior. All DLLs have an initialization routine called by the loader (and on thread creation), which tends to be used to set up things you don't want the caller to have to explicitly initialize. DLLs that maintain global state they share with copies of themselves in other processes can set up shared memory etc to do that. They can easily change their behavior based on the environment at the time of process start. There are also all the hooks for extension points, such as Winsock LSPs. Most such things happen only after an explicit initialization (e.g. WSAStartup() or socket creation in the Winsock case), but between the C runtime and third-party libraries, it may be happening when you don't expect it. All that said, I don't actually have a real-world example of process VM layout changing like this, especially since you are using it early to avoid this very problem. I'd love to find out exactly what's going on in Terry's case, but I haven't come up with a good way to do it that doesn't disturb his production environment. > If you've got a specific suggestion for making it more reliable, > we're all ears. To elaborate on what I said earlier, internal_forkexec() creates the process suspended; while it has an execution environment set up, the loader hasn't done all the DLL linking and initialization yet, so the address space is relatively untouched. At that point you could use VirtualAllocEx() to reserve VM space for the shared memory at the right address, and proceed with the rest of the setup. When the new backend starts up, it would then VirtualFree() that space immediately before calling MapViewOfFileEx() on it. I can probably set up with the 8.3 tree and MSVC to create an artificial failure, and play with the above as a fix, but I'm not quite sure when that will be. There's still the issue of verifying it is the problem on Terry's machine, and figuring out a fix for him. On 8/24/07, Terry Yapt <[EMAIL PROTECTED]> wrote: > Yes, the windows system log (application log section) doesn't show any > error in several days. Suddenly errors bring back to life and syslog > errors repeats every few time. But again errors disappears and return > in a few hours. After few hours the system goes out. > > Curiosity: > == > On the log lines I have and I sent to the list: * FATAL: could not > reattach to shared memory (key=5432001, addr=01D8): Invalid argument > , this one: "addr=01D8" is always the same in spite of the system > have been shutting down and restarted or the error was out for a days. The environment is consistent then. Whatever is going on, when postgres first starts things are normal, something just changes later and the change is temporary. As vague guides, I would look at some kind of global resource usage/tracking, and scheduled tasks. Do you see any patterns about WHEN this happens? During high load periods? Any antivirus or other security type tasks running on the machine? Any third-party VPN type software? Fast User Switching or Remote Desktop use? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL Diff ?
On Saturday 25 August 2007 23:49:39 Ron Johnson wrote: > On 08/25/07 22:21, Kevin Kempter wrote: > > On Saturday 25 August 2007 21:10:19 Ron Johnson wrote: > >> On 08/25/07 21:51, Kevin Kempter wrote: > >>> Hi List; > >>> > >>> I have a very large table (52million rows) - I'm creating a copy of it > >>> to rid it of 35G worth of dead space, then I'll do a sync, drop the > >>> original table and rename table2. > >> > >> What is your definition of "dead space"? > >> > >> Bad rows, duplicate rows, old rows? Something else? > > > > deleted rows that should have been cleaned up with vacuum, problem is the > > client let it go so long that now I cant get a vacuum to finish cause it > > impacts the day2day operations too much. Long story, see my recent > > questions on the performance list for more info. > > OK. > > >>> Once I have the table2 as a copy of table1 what's the best way to > >>> select all rows that have been changed, modified in table1 since the > >>> initial laod from table1 into table2? > > Is this a 24x7 database? Yes. with little room for extra overhead > > >>> Also I'll need to delete any rows in table2 that no longer remain in > >>> table1. > >>> > >>> There is no change date column > >>> I could do something like select * from table1 where col1 || col2 || > >>> col3 etc not in (select col1 || col2 || col3 etc from table2) > >>> > >>> but this would be ineffecient & slow. > >>> > >>> Anyone have a suggestion to do this in an efficient manner? > >>> > >>> Thanks in advance ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] problem Linking a TTable component to a pgsql view using BCB5
> Does the table have a unique index/primary key? The view shows fields from two tables. One of the primary keys of one of the tables is shown by the view. Thanks, JLoz > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster > ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] simple query runs 26 seconds
I have W2K server, relatively small database containing all required indexes and need to sum only few records. My query takes 26 seconds to run. How to fix this ? Andrus. explain analyze select sum(taitmata) as ukogus from rid join dok using (dokumnr) where toode='NE TR' and doktyyp='U' returns "Aggregate (cost=47581.25..47581.26 rows=1 width=8) (actual time=26462.178..26462.178 rows=1 loops=1)" " -> Hash Join (cost=1175.91..47579.54 rows=684 width=8) (actual time=26462.002..26462.108 rows=7 loops=1)" "Hash Cond: (rid.dokumnr = dok.dokumnr)" "-> Bitmap Heap Scan on rid (cost=673.35..46946.99 rows=19703 width=12) (actual time=19.864..26430.100 rows=21335 loops=1)" " Recheck Cond: (toode = 'NE TR'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..668.42 rows=19703 width=0) (actual time=12.402..12.402 rows=21335 loops=1)" "Index Cond: (toode = 'NE TR'::bpchar)" "-> Hash (cost=458.47..458.47 rows=3528 width=4) (actual time=0.141..0.141 rows=6 loops=1)" " -> Index Scan using dok_tasudok_unique_idx on dok (cost=0.00..458.47 rows=3528 width=4) (actual time=0.057..0.125 rows=6 loops=1)" "Index Cond: (doktyyp = 'U'::bpchar)" "Total runtime: 26462.551 ms" using "PostgreSQL 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special)" ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] reporting tools
Use www.fyireporting.com Open source, uses excellent PostgreSQL npgsql drivers. Use standard RDL format Andrus. "Geoffrey" <[EMAIL PROTECTED]> kirjutas sõnumis news:[EMAIL PROTECTED] > We are looking for a reporting tool that will enable users to generate > their own reports. Something like Crystal Reports. > > Anyone using something like this with Postgresql? > > -- > Until later, Geoffrey > > Those who would give up essential Liberty, to purchase a little > temporary Safety, deserve neither Liberty nor Safety. > - Benjamin Franklin > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] FATAL: could not reattach to shared memory (Win32)
Trevor Talbot escribió: The environment is consistent then. Whatever is going on, when postgres first starts things are normal, something just changes later and the change is temporary. As vague guides, I would look at some kind of global resource usage/tracking, and scheduled tasks. Do you see any patterns about WHEN this happens? During high load periods? Any antivirus or other security type tasks running on the machine? Any third-party VPN type software? Fast User Switching or Remote Desktop use? I have spent a lot of time looking for patterns on system logs, apache logs, postgres logs, etc... I have not found any clue conclusive. Only I can say I have this kind of errors on postgreSQL-Logs: '2007-08-21 15:19:21 ERROR: could not open relation 16692/16694/17295: Invalid argument' And next log line/s are the statement-X. But Statement-X runs ok and give me right results when I copy+paste on any sql editor connected to that DB. That errors are not 'linked on time' with FATAL errors we are speaking about on this thread. I am trying to get the opportunity to migrate that DB to another server and use that server to test anything we want, but the customer is reluctant to let me that server to try-test-errors process because that is their mail and web server too. :-( In spite of that server is remote far away from my location I have a console (UltraVNC) to it if you need something to looking for. Greetings. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Geographic High-Availability/Replication
Hi, Bill Moran wrote: I'm curious as to how Postgres-R would handle a situation where the constant throughput exceeded the processing speed of one of the nodes. Well, what do you expect to happen? This case is easily detectable, but I can only see two possible solutions: either stop the node which is to slow or stop accepting new transactions for a while. This technique is not meant to allow nodes to lag behind several thousands of transactions - that should better be avoided. Rather it's meant to decrease the commit delay necessary for synchronous replication. I can see your system working if it's just spike loads and the slow nodes can catch up during slow periods, but I'm wondering about the scenarios where an admin has underestimated the hardware requirements and one or more nodes is unable to keep up. Please keep in mind, that replication per se does not speed your database up, it rather adds a layer of reliability, which *costs* some performance. To increase the transactional throughput you would need to add partitioning to the mix. Or you could try to make use of the gained reliability and abandon WAL - you won't need that as long as at least one replica is running - that should increase the single node's throughput and therefore the cluster's throughput, too. When replication meets partitioning and load balancing, you'll get into a whole new world, where new trade-offs need to be considered. Some look similar to those with RAID storage - probably Sequoia's term RAIDb isn't bad at all. Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] simple query runs 26 seconds
"Andrus" <[EMAIL PROTECTED]> writes: > My query takes 26 seconds to run. The time seems entirely spent in fetching rows from table "rid". Perhaps that table is bloated by lack of vacuuming --- can you show the output from "vacuum verbose rid"? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] problem Linking a TTable component to a pgsql view using BCB5
On 8/23/07, JLoz <[EMAIL PROTECTED]> wrote: > > Does the table have a unique index/primary key? > > The view shows fields from two tables. One of the primary keys of one of > the tables is shown by the view. you will probably have better luck with the TQuery component. Also, you should try out another TDataset implementation...for example Zeos. The BDE is old, slow, and buggy. merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Restore v. Running COPY/INDEX seperatly
Hi, So, I built my tables which contains a TSearch2 field by 1. Create table without indexes 2. COPY data into table 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); 5. Index all the fields including the TSearch2 field The process takes several days. In contrast, if I backup the table and restore it to a new table it takes a fraction of the time as running the above operation manually. I am building my indexes at the end but I think the step 4 may be causing uneeded overhead. Can I somehow just copy data into the idxFTI field during the copy process? Is there anything else I can do to get my loading process to perform similar to backup/restore? Does pg_dump also dump the indexes? That would explain why it is so much faster... Benjamin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
On Sat, Aug 25, 2007 at 10:18:25AM -0400, Bill Moran wrote: > "Phoenix Kiula" <[EMAIL PROTECTED]> wrote: > > > > We're moving from MySQL to PG, a move I am rather enjoying, but > > we're currently running both databases. As we web-enable our > > financial services in fifteen countries, I would like to recommend > > the team that we move entirely to PG. > > > > In doing research on big installations of the two databases, I > > read this from a MySQL senior exec on Slashdot: > > Senior MySQL exec means this is a marketing blurb, which means it's > exaggerated, lacking any honest assessment of challenges and > difficulties, and possibly an outright lie. I've no doubt that > MySQL can do clusters if you know what you're doing, but if you want > the facts, you're going to have to look deeper than that obviously > biased quote. I seem to remember a forum thread with someone having > considerable difficulty with MySQL cluster, and actual MySQL > employees jumping in to try to help and no solution ever found. > Anyone have that link lying around? I think this is the one. http://forums.mysql.com/read.php?25,93181,93181 Cheers, David. -- David Fetter <[EMAIL PROTECTED]> http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Restore v. Running COPY/INDEX seperatly
On Sun, 26 Aug 2007, Benjamin Arai wrote: Hi, So, I built my tables which contains a TSearch2 field by 1. Create table without indexes 2. COPY data into table 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); vacuum here 5. Index all the fields including the TSearch2 field The process takes several days. In contrast, if I backup the table and restore it to a new table it takes a fraction of the time as running the above operation manually. I am building my indexes at the end but I think the step 4 may be causing uneeded overhead. Can I somehow just copy data into the idxFTI field during the copy process? Is there anything else I can do to get my loading process to perform similar to backup/restore? Does pg_dump also dump the indexes? That would explain why it is so much faster... Benjamin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Restore v. Running COPY/INDEX seperatly
"Oleg Bartunov" <[EMAIL PROTECTED]> writes: > On Sun, 26 Aug 2007, Benjamin Arai wrote: > >> Hi, >> >> So, I built my tables which contains a TSearch2 field by >> >> 1. Create table without indexes >> 2. COPY data into table >> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; >> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); > > vacuum here Or you could do something tricky and do the update like this which would avoid the need to vacuum: ALTER TABLE tblMessages ALTER COLUMN idxFTI TYPE tsvector USING to_tsvector('default, strMesage); This only works because ALTER TABLE rewrites the table from scratch any time it does an operation like this. Don't try this if there are transactions working against the table at the same time (such as a pg_dump!). Or you could set up a trigger to generate the tsvector when you first load the data instead of adding it later. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL Diff ?
On Aug 26, 2007, at 9:02 AM, Dawid Kuroczko wrote: On 8/26/07, Kevin Kempter <[EMAIL PROTECTED]> wrote: On Saturday 25 August 2007 21:10:19 Ron Johnson wrote: On 08/25/07 21:51, Kevin Kempter wrote: Hi List; I have a very large table (52million rows) - I'm creating a copy of it to rid it of 35G worth of dead space, then I'll do a sync, drop the original table and rename table2. What is your definition of "dead space"? Bad rows, duplicate rows, old rows? Something else? deleted rows that should have been cleaned up with vacuum, problem is the client let it go so long that now I cant get a vacuum to finish cause it impacts the day2day operations too much. Long story, see my recent questions on the performance list for more info. In your place I would do something like Slony-I does, when it replicates the tables. Create on insert/update/delete triggers on table1 which will log operations on table1 to some table1_log table. Then copy table1 to table2. Then replay table1_log on table2, then BEGIN;LOCK tablel1;finish replaying the lock;DROP table1; alter table rename...;commit; Or perhaps actually use Slony-I for the above steps? Should work quite nicely... Or perhaps use SkyTools for it (I've never used it)? Yeah, for trigger based replication it'd be simpler to just use Slony- I or Skytools. However, if you're on 8.2, with row-wise comparisons, you could do something like: begin; lock table1; insert into table2 select * from table1 where id not in (select id from test2); drop table1; alter table2 rename to table1; commit; Here id is your primary key. Note that if your ids are generated by a sequence you'll need to use setval on the sequence to get it "caught up" before that commit or you'll get duplicate key errors immediately. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Restore v. Running COPY/INDEX seperatly
Gregory Stark <[EMAIL PROTECTED]> writes: >> On Sun, 26 Aug 2007, Benjamin Arai wrote: >>> So, I built my tables which contains a TSearch2 field by >>> 1. Create table without indexes >>> 2. COPY data into table >>> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; >>> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); > Or you could set up a trigger to generate the tsvector when you first > load the data instead of adding it later. You're going to want such a trigger anyway, so installing it before the COPY step seems like the Obviously Right Thing. Any other approach implies rewriting the entire table after you've loaded it, with no compensating advantage that I can see. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Insert or Replace or \copy (bulkload)
On Tue, 2007-08-14 at 10:16 -0500, Scott Marlowe wrote: > On 8/14/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote: > > I'm seeing an obstacle in my aim to migrate from mysql to PG mainly from > > the manner in which PG handles duplicate entries either from primary > > keys or unique entries. > > > > Data is taken from perl DBI into (right now) CSV based files to be used > > via psql's \copy command to insert into the table. > > > > In MySql, I was using mysqlimport --replace which essentially provided > > the means to load data into the DB, while at the same time, would > > provide the necessary logic to replace the entire row if there was a > > duplicate instead of dying. > > > > Under PG, I've yet to found something similar to this functionality and > > searches via google has uncovered that this is one thing which _still_ > > has not found its way into PG. (anyone knows why? Standards?) > > > > Anyway, I found a workaround, but, to me, even though it provides a > > means to an end, it still looks like it'll end up as a maintenance > > nightmare each time a table has any additional columns added. > > > > Solution is taken from this site: > > > > http://www.pointwriter.com/blog/index.php?/archives/6-REPLACE-in-PostgreSQL.html > > Example code snipped for brevity > > > Can anyone tell me if this won't turn out to be a maintenance nightmare? > > So, the pertinent question is, is there a better mousetrap available? > > I don't see why it would be a maintenance nightmare. Looks pretty > much like you just create it and go. Once it's in place it should > just work. There are other ways to skin this particular cat, but that > one seems as good as any. I just ran into trouble with this. This rule seems to work when I do simple inserts, but as what I will be doing will be doing \copy bulkloads, it will balk and fail. Is there a suitable/replacement rule which can be used for this purpose which will work bulkload wise? Now would be a good idea to teach me how to skin the cat differently. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Restore v. Running COPY/INDEX seperatly
Tom Lane wrote: > Gregory Stark <[EMAIL PROTECTED]> writes: > >> On Sun, 26 Aug 2007, Benjamin Arai wrote: > >>> So, I built my tables which contains a TSearch2 field by > >>> 1. Create table without indexes > >>> 2. COPY data into table > >>> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; > >>> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); > > > Or you could set up a trigger to generate the tsvector when you first > > load the data instead of adding it later. > > You're going to want such a trigger anyway, so installing it before the > COPY step seems like the Obviously Right Thing. Any other approach > implies rewriting the entire table after you've loaded it, with no > compensating advantage that I can see. Isn't the main speed advantage of the dump the fact that the to_tsvector() results already come in the COPY data? The dump already comes with the idxFTI column contents, instead of having to generate it from scratch. That would depend on how expensive that function is, of course. -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "PHP is what I call the "Dumb Monkey" language. [A]ny dumb monkey can code something in PHP. Python takes actual thought to produce something useful." (J. Drake) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
\copy ignoring Rules Was [Re: [GENERAL] Insert or Replace or \copy (bulkload)]
On Mon, 2007-08-27 at 11:55 +0800, Ow Mun Heng wrote: > I just ran into trouble with this. This rule seems to work when I do > simple inserts, but as what I will be doing will be doing \copy > bulkloads, it will balk and fail. > Now would be a good idea to teach me how to skin the cat differently. Base on the arhives, seems like this was raised as BUG 2437 close to a year ago and it was marked as "correct behaviour" and the suggested alternative is to either 1. Use pgloader (DL'ed it, but have not gone indepth to see if it suits my needs) 2. COPY into temp table, and then perform an insert into. =>create table foo (a int, b int, c int, d varchar(1) e bigint); =>alter table foo add primary key (a,b,c,e); =>insert into foo values(1,1,1,'A',1); =>create rule replace_foo as on insert to foo where (exists(select 1 from foo where foo.a = new.a and foo.b = new.b and foo.c = new.c and foo.e = new.e)) do instead update foo set d = new.d where foo.a = new.a and foo.b = new.b and foo.c = new.c and foo.e = new.e => \d foo; Table "xmms.foo" Column | Type | Modifiers +--+--- a | integer | not null b | integer | not null c | integer | not null d | character varying(1) | e | bigint | not null Indexes: "foo_pkey" PRIMARY KEY, btree (a, b, c, e) Rules: replace_foo AS ON INSERT TO foo WHERE (EXISTS ( SELECT 1 FROM foo WHERE foo.a = new.a AND foo.b = new.b AND foo.c = new.c AND foo.e = new.e)) DO INSTEAD UPDATE foo SET d = new.d WHERE foo.a = new.a AND foo.b = new.b AND foo.c = new.c AND foo.e = new.e =>create table temp_table_foo as select * from foo where 1=0; => select * from foo; a | b | c | d | e ---+---+---+---+--- 1 | 1 | 1 | A | 1 =>insert into temp_table_foo values(1,1,1,'Z',1); => select * from foo; a | b | c | d | e ---+---+---+---+--- 1 | 1 | 1 | Z | 1 => truncate table foo Seems like an awful lot of step and may even make the \copy process even longer than it should.. Is there any other method? Seems to me pgloader also does more or less the same thing.. What about pgbulkload? Anyone can provide any new insights? Thanks... http://archives.postgresql.org/pgsql-bugs/2006-05/msg00073.php ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Restore v. Running COPY/INDEX seperatly
Why is a trigger faster than doing a ALTER after table is created? I thought a trigger would be slower because it would be invoked every iteration (a new row is inserted) during the COPY process. Benjamin On Aug 26, 2007, at 8:43 PM, Tom Lane wrote: Gregory Stark <[EMAIL PROTECTED]> writes: On Sun, 26 Aug 2007, Benjamin Arai wrote: So, I built my tables which contains a TSearch2 field by 1. Create table without indexes 2. COPY data into table 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); Or you could set up a trigger to generate the tsvector when you first load the data instead of adding it later. You're going to want such a trigger anyway, so installing it before the COPY step seems like the Obviously Right Thing. Any other approach implies rewriting the entire table after you've loaded it, with no compensating advantage that I can see. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/