[GENERAL] Moving Specific Data Across Schemas Including FKs
I have the need to move a specific set of data from one schema to another. These schemas are on the same database instance and have all of the same relations defined. The SQL to copy data from one table is relatively straightforward: INSERT INTO schema_b.my_table SELECT * FROM schema_a.my_table WHERE ... What I am trying to figure out is that if I also have other relations that have foreign keys into the data I am moving, how would I also move the data from those relations and maintain the FK integrity? The tables are setup to use BIGSERIAL values for the id column which is the primary key, and the foreign keys reference these id columns. Ideally each schema would use it's own serial for the ID values, but I'm open to clever alternatives. I am using PG 9.3.5 (Amazon RDS), but with an option to move to 9.4 should the situation call for it. thanks --Cory
Re: [GENERAL] Moving Specific Data Across Schemas Including FKs
On Thu, Apr 23, 2015 at 10:27 AM Steve Atkins wrote: > > On Apr 23, 2015, at 10:09 AM, Cory Tucker wrote: > > > I have the need to move a specific set of data from one schema to > another. These schemas are on the same database instance and have all of > the same relations defined. The SQL to copy data from one table is > relatively straightforward: > > > > INSERT INTO schema_b.my_table > > SELECT * FROM schema_a.my_table WHERE ... > > Would ALTER TABLE ... SET SCHEMA do what you need? A schema is mostly just > a name space, so there's no need to create new tables or copy data around. > > Cheers, > Steve > If I were moving all of the contents from these table(s) then it might work, but I only want to move a specific selection of it based on a where clause. --Cory > > > > > What I am trying to figure out is that if I also have other relations > that have foreign keys into the data I am moving, how would I also move the > data from those relations and maintain the FK integrity? > > > > The tables are setup to use BIGSERIAL values for the id column which is > the primary key, and the foreign keys reference these id columns. Ideally > each schema would use it's own serial for the ID values, but I'm open to > clever alternatives. > > > > I am using PG 9.3.5 (Amazon RDS), but with an option to move to 9.4 > should the situation call for it. > > > > thanks > > --Cory > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
[GENERAL] Grouping By Similarity (using pg_trgm)?
[pg version 9.3 or 9.4] Suppose I have a simple table: create table data ( my_value TEXT NOT NULL ); CREATE INDEX idx_my_value ON data USING gin(my_value gin_trgm_ops); Now I would like to essentially do group by to get a count of all the values that are sufficiently similar. I can do it using something like a CROSS JOIN to join the table on itself, but then I still am getting all the rows with duplicate counts. Is there a way to do a group by query and only return a single "my_value" column and a count of the number of times other values are similar while also not returning the included similar values in the output, too?
Re: [GENERAL] Grouping By Similarity (using pg_trgm)?
That produces pretty much the same results as the CROSS JOIN I was using before. Because each "my_value" in the table are different, if I group on just their value then I will always have the full result set and a bunch of essentially duplicated results. Any other ideas/options? On Thu, May 14, 2015 at 12:08 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > > On Thu, May 14, 2015 at 11:58 AM, Cory Tucker > wrote: > >> [pg version 9.3 or 9.4] >> >> Suppose I have a simple table: >> >> create table data ( >> my_value TEXT NOT NULL >> ); >> CREATE INDEX idx_my_value ON data USING gin(my_value gin_trgm_ops); >> >> >> Now I would like to essentially do group by to get a count of all the >> values that are sufficiently similar. I can do it using something like a >> CROSS JOIN to join the table on itself, but then I still am getting all the >> rows with duplicate counts. >> >> Is there a way to do a group by query and only return a single "my_value" >> column and a count of the number of times other values are similar while >> also not returning the included similar values in the output, too? >> >> > Concept below - not bothering to lookup the functions/operators for > pg_trgm: > > SELECT my_value_src, count(*) > FROM (SELECT my_value AS my_value_src FROM data) src > JOIN (SELECT my_value AS my_value_compareto FROM data) comparedto > ON ( func(my_value_src, my_value_compareto) < # ) > GROUP BY my_value_src > > David J. > >
[GENERAL] Most effective settings for deleting lots of data?
Hi, I am using postgres 9.3 and am preparing to migrate to 9.4. As part of the migration, I was hoping to also delete a bunch of data that is no longer needed (100M+ rows across several tables). I can fairly trivially delete the data by doing a simple statement like this: DELETE FROM account WHERE id = 1234; All the foreign keys are setup to have "ON DELETE CASCADE", so this will effectively drop the entire graph, which is what I want. My desire is to do this as quickly as possible. The first thing I have done is to disable autovacuuming on all of the largest tables so that the entire system doesn't lock up, like so: ALTER TABLE my_data SET (autovacuum_enabled = false, toast.autovacuum_enabled = false); I was wondering what other settings might be best be applied? For instance, I was thinking that "work_mem" could be bumped up to some ridiculously large value if needed. I have the benefit of being able to have exclusive access to the database at the time that this operation needs to happen so most options are on the table. What settings would you recommend? Also, it just occurred to me that I should try to disable/drop all indexes (especially since they will be recreated) later so that those are not updated in the process. thanks in advance, --Cory
[GENERAL] Efficiently Triggering Autovacuum Analyze?
We have a performance problem accessing one of our tables, I think because the statistics are out of date. The table is fairly large, on the order of 100M rows or so. The general structure of the table is as follows: Column | Type | Modifiers ---+--+ id | bigint | not null default nextval('foo_id_seq'::regclass) record_id | text | account_id | bigint | not null With indexes: "foo_pkey" PRIMARY KEY, btree (id) "uq_account_id_record_id" UNIQUE CONSTRAINT, btree (account_id, record_id) This table is almost always queried using a combination of (account_id, record_id) and is generally pretty fast. However, under certain loads, the query becomes slower and slower as time goes on. The workload that causes this to happen is when data for a new account_id is being inserted into the table. This will happen in rapid succession and may insert millions of rows over the course of several hours. The pattern that I notice when this happens is that the CPU on DB will be pegged much higher than usual, and the query to lookup records for the (account_id, record_id) combo will steadily rise from <1ms to more then 2 or 3 seconds over time. The fix I have employed to restore the speed of the query after I notice it is happening is to manually issue a VACUUM ANALYZE on the table. After the analyze is done, the query returns to its normal speed. I am looking for suggestions for how to tune, or perhaps automatically detect this pattern, so that I don't have to manually intervene whenever this happens. Here are my autovacuum settings: name | setting | unit -+---+-- autovacuum | on| autovacuum_analyze_scale_factor | 0.05 | autovacuum_analyze_threshold| 50| autovacuum_freeze_max_age | 2 | autovacuum_max_workers | 3 | autovacuum_multixact_freeze_max_age | 4 | autovacuum_naptime | 30| s autovacuum_vacuum_cost_delay| 20| ms autovacuum_vacuum_cost_limit| -1| autovacuum_vacuum_scale_factor | 0.1 | autovacuum_vacuum_threshold | 50| autovacuum_work_mem | -1| kB We're using 9.4.4 (RDS)
Re: [GENERAL] Efficiently Triggering Autovacuum Analyze?
On Wed, Dec 30, 2015 at 11:20 AM Tom Lane wrote: > Cory Tucker writes: > > This table is almost always queried using a combination of (account_id, > > record_id) and is generally pretty fast. However, under certain loads, > the > > query becomes slower and slower as time goes on. The workload that > causes > > this to happen is when data for a new account_id is being inserted into > the > > table. This will happen in rapid succession and may insert millions of > > rows over the course of several hours. > > Are those insertions happening in one enormous transaction, or even just > a few very large ones? > No, one transaction per row insert. > > > The pattern that I notice when this happens is that the CPU on DB will be > > pegged much higher than usual, and the query to lookup records for the > > (account_id, record_id) combo will steadily rise from <1ms to more then 2 > > or 3 seconds over time. > > I'm suspicious that this is not autovacuum's fault but reflects the cost > of checking uncommitted tuples to see if they've become committed yet. > If so, there may be little you can do about it except break the insertion > into smaller transactions ... which might or might not be all right from > a data consistency standpoint. > > regards, tom lane >
Re: [GENERAL] Blocked updates and background writer performance
On Wed, Jan 13, 2016 at 9:48 AM Vick Khera wrote: > That was my intuition too. Not enough I/O available from the hardware for > the workload requested. > > As recommended, log your checkpoints and try tuning them to spread the > load. > Thanks guys, will turn on checkpoint logging and try to sniff this out further.
[GENERAL] Query Using Massive Temp Space
Hello, I have a query that is using a tremendous amount of temp disk space given the overall size of the dataset. I'd love for someone to try to explain what PG is doing and why its using so much space for the query. First off, the system is PG 9.6 on Ubuntu with 4 cores and 28 GB of RAM. The query in question is a fairly large join of several tables (6) including some aggregations. The overall dataset size of the 6 tables in question is about 20GB and the largest table is about 15M rows. The query is essentially a dump of *most* of the data from these tables joined together to be used in another system. When the query runs it begins to use an aggressive amount of temp space on the volume over the course of many hours, until it reaches about 95% capacity and then tapers off. Never completes though. The temp space it uses is around 1.5TB out of a 2TB volume. Again, the *total* size of the relations in question is only 20GB. Can anyone explain how the query could possibly use so much temp space? Query and query plan are attached. Please let me know if any more info is needed. thanks --Cory query Description: Binary data query_plan Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Determine all listeners subscribed to notifcations and what channels
I'm interested in trying to figure out which channels have been subscribed to (using LISTEN). From what I could tell via a little Googling, there used to be a table named pg_catalog.pg_listener that contained all this information, but that seems to have disappeared somewhere in the 9.x release (I'm using 9.3.x). Is there a way to find out which channels have listeners? thanks --Cory