Re: Migrating PostgreSQL Stored Procedures to MSSQL 2019 for example
I would suggest no easy way with a tool .. postgresql is powerful because you can write functions in different languages in addition to PLpgSQL so python,perl,tcl,js,c++ and there are many popular extensions. So very possible to encapsulate complex application/business logic within database. But if it is just a database with a bit of PLpgSQL to expose data to a front end application that does all the work then maybe a tool might work. You may have noticed how powerful AWS SCT is converting MSSQL & Oracle to Postgresql but alas will I believe only convert Postgresql to Postres alike databases Aurora and MySQL ... Not to say someone somewhere hasn’t but a claimed strength of Postgresql is probably a hindrance in this instance. Hector Vass Data Engineer 07773 352559 On Fri, 12 Aug 2022, 11:24 Scott Simpson, wrote: > Hi, > > I need to migrate many PostgreSQL Stored Procedures and functions to MSSQL. > > I can find anything online that seems to handle this task. > > Are there any tools that you have that can do this job? > > > *Kind Regards* > > > *Zellis* | Scott Simpson | Senior Engineer > > > > Thorpe Park > > United Kingdom > > Work : +44 (0)20 3986 3523 > > Email : scott.simp...@zellis.com > > Web : www.Zellis.com <http://www.zellis.com/> > > > -- > > *Zellis is the trading name for Zellis Holdings Ltd and its associated > companies “Zellis”.* > > The contents of this email are confidential to Zellis and are solely for > the use of the intended recipient. If you received this email in error, > please inform the sender immediately and delete the email from your system. > Unless Zellis have given you express permission to do so, please do not > disclose, distribute or copy the contents of this email. > > Unless this email expressly states that it is a contractual offer or > acceptance, it is not sent with the intention of creating a legal > relationship and does not constitute an offer or acceptance which could > give rise to a contract. > > Any views expressed in this email are those of the individual sender > unless the email specifically states them to be the views of Zellis. > > Zellis Holdings Ltd - registered in England and Wales - Company No: > 10975623 - Registered Office: 740 Waterside Drive, Aztec West, Almondsbury, > Bristol, BS32 4UF, UK. >
Re: How to eliminate extra "NOT EXISTS"-query here?
Not sure you need to use array why not simple table joins, so a table with your criteria x y z t joined to stuff to give you candidates that do match, then left join with coalesce to add the 'd' select --a.id,b.test_id, coalesce(a.id,b.test_id) as finalresult from test a left join ( select test_id from stuff a inner join (values ('x'),('y'),('z'),('t')) b (v) using(v) group by 1 )b on(a.id=b.test_id); Regards Hector Vass On Sat, Nov 25, 2023 at 4:08 PM Tom Lane wrote: > Andreas Joseph Krogh writes: > > -- This works, but I'd rather not do the extra EXISTS > > select * from test t > > WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select > array_agg(s.v) from > > stuffs WHERE s.test_id = t.id) > > OR NOT EXISTS ( > > select * from stuff s where s.test_id = t.id > > ) > > ) > > ; > > > So, I want to return all entries in test not having any of ARRAY ['x', > 'y', > > 'z', 't'] referenced in the table stuff, and I'd like to have test.id="d" > > > returned as well, but in order to do that I need to execute the “or not > > exists”-query. Is it possible to avoid that? > > Probably not directly, but perhaps you could improve the performance of > this query by converting the sub-selects into a left join: > > select * from test t > left join > (select s.test_id, array_agg(s.v) as arr from stuffs group by > s.test_id) ss > on ss.test_id = t.id > WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr) > OR ss.test_id IS NULL; > > Another possibility is > > ... > WHERE (ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr) IS NOT TRUE > > but I don't think that's more readable really, and it will save little. > > In either case, this would result in computing array_agg once for > each group of test_id values in "stuffs", while your original computes > a similar aggregate for each row in "test". So whether this is better > depends on the relative sizes of the tables, although my proposal > avoids random access to "stuffs" so it will have some advantage. > > regards, tom lane > > >
Re: How to eliminate extra "NOT EXISTS"-query here?
Not equivalent to the use of NOT ARRAY and entirely possible I have misunderstood the requirement ...do you have some more test cases the non array solution does not work for Regards Hector Vass 07773 352559 On Mon, Nov 27, 2023 at 9:29 AM Dominique Devienne wrote: > On Sat, Nov 25, 2023 at 5:53 PM hector vass wrote: > >> Not sure you need to use array why not simple table joins, so a table >> with your criteria x y z t joined to stuff to give you candidates that do >> match, then left join with coalesce to add the 'd' >> >> select >> >> --a.id,b.test_id, >> >> coalesce(a.id,b.test_id) as finalresult >> >> from test a >> >> left join ( >> >> select >> >> test_id >> >> from stuff a >> >> inner join (values ('x'),('y'),('z'),('t')) b (v) using(v) >> >> group by 1 >> >> )b on(a.id=b.test_id); >> > > Hi Hector. Hopefully this is not a stupid question... > > How is that equivalent from the `NOT ARRAY ... <@ ...` though? > The inner-join-distinct above will return test_id's on any match, but you > can't know if all array values are matches. Which is different from > > > Is the first array contained by the second > > from the <@ operator, no? > I'm unfamiliar with these operators, so am I missing something? > Just trying to understand the logic here. Thanks, --DD >
Re: Get back the number of columns of a result-set prior to JSON aggregation
I think you are just trying to get the number of columns in the underlying table, no real cost to read the metadata select count(id), (select count(attrelid) from pg_attribute where attrelid= 't1'::regclass and attnum>0) , json_agg(t) from t1 t; select count(id), (select count(attrelid) from pg_attribute where attrelid= 't2'::regclass and attnum>0) , json_agg(t) from t2 t; Regards Hector Vass 07773 352559 On Tue, Nov 28, 2023 at 12:12 PM Dominique Devienne wrote: > Hi. I've got a nice little POC using PostgreSQL to implement a REST API > server. > This uses json_agg(t) to generate the JSON of tables (or subqueries in > general), > which means I always get back a single row (and column, before I added the > count(t.*)). > > But I'd like to get statistics on the number of rows aggregated (easy, > count(*)), > but also the number of columns of those rows! And I'm stuck for the > latter... > > Is there a (hopefully efficient) way to get back the cardinality of a > select-clause basically? > Obviously programmatically I can get the row and column count from the > result-set, > but I see the result of json_agg() myself, while I want the value prior to > json_agg(). > > Is there a way to achieve this? > > Thanks, --DD > > PS: In the example below, would return 1 for the 1st query, and 2 for the > 2nd. > > ``` > migrated=> create table t1 (id integer); > CREATE TABLE > migrated=> insert into t1 values (1), (2); > INSERT 0 2 > migrated=> create table t2 (id integer, name text); > CREATE TABLE > migrated=> insert into t2 values (1, 'one'), (2, 'two'); > INSERT 0 2 > migrated=> select count(t.*), json_agg(t) from t1 t; > count | json_agg > ---+- > 2 | [{"id":1}, + >| {"id":2}] > (1 row) > > > migrated=> select count(t.*), json_agg(t) from t2 t; > count | json_agg > ---+-- > 2 | [{"id":1,"name":"one"}, + >| {"id":2,"name":"two"}] > (1 row) > ``` >
Re: Mimic ALIAS in Postgresql?
On Tue, 16 Jan 2024, 17:21 Ron Johnson, wrote: > Some RDBMSs have CREATE ALIAS, which allows you to refer to a table by a > different name (while also referring to it by the original name). > > We have an application running on DB2/UDB which (for reasons wholly > unknown to me, and probably also to the current developer) extensively uses > this with two schemas: MTUSER and MTQRY. For example, sometimes refer to > MTUSER.sometable and other times refer to it as MYQRY.sometable. > > My goal is to present a way to migrate from UDB to PG with as few > application changes as possible. Thus, the need to mimic aliases. > > Maybe updatable views? > CREATE VIEW mtqry.sometable AS SELECT * FROM mtuser.sometable; > I think views will work. Alternative might be interpose a proxy to rewrite the SQL. https://www.galliumdata.com/ gives you an idea of what this might look like although could do a lite version yourself. >
Re: creating a subset DB efficiently ?
On Fri, Mar 8, 2024 at 4:22 PM David Gauthier wrote: > Here's the situation > > - The DB contains data for several projects. > - The tables of the DB contain data for all projects (data is not > partitioned on project name or anything like that) > - The "project" identifier (table column) exists in a few "parent" tables > with many child... grandchild,... tables under them connected with foreign > keys defined with "on delete cascade". So if a record in one of the parent > table records is deleted, all of its underlying, dependent records get > deleted too. > - New projects come in, and old ones need to be removed and "archived" in > DBs of their own. So there's a DB called "active_projects" and there's a > DB called "project_a_archive" (identical metadata). > - The idea is to copy the data for project "a" that's in "active_projects" > to the "project_a_arhchive" DB AND delete the project a data out of > "active_projects". > - Leave "project_a_archive" up and running if someone needs to attach to > that and get some old/archived data. > > The brute-force method I've been using is... > 1) pg_dump "active_projects" to a (huge) file then populate > "project_a_archive" using that (I don't have the privs to create database, > IT creates an empty one for me, so this is how I do it). > 2) go into the "project_a_archive" DB and run... "delete from par_tbl_1 > where project <> 'a' ", "delete from par_tbl_2 where project <> 'a' ", > etc... leaving only project "a" data in the DB. > 3) go into the "active_projects" DB and "delete from par_tbl_1 where > project = 'a' ", etc... removing project "a" from the "active_projects DB. > > Ya, not very elegant, it takes a long time and it takes a lot of > resources. So I'm looking for ideas on how to do this better. > > Related question... > The "delete from par_tbl_a where project <> 'a' " is taking forever. I > fear it's because it's trying to journal everything in case I want to > rollback. But this is just in the archive DB and I don't mind taking the > risk if I can speed this up outside of a transaction. How can I run a > delete command like this without the rollback recovery overhead ? > >(I don't have the privs to create database, IT creates an empty one for me, so this is how I do it). That's a shame. You can do something similar with tablespaces Template your existing schema to create a new schema for the project (pg_dump -s) Create tablespace for this new project and schema You can then move the physical tablespace to cheaper disk and use symbolic links or... archive and/or back it up at the schema level with pg_dump -n ...as long as you don't put anything in the public schema all you are really sharing is roles otherwise a bit like a separate database
Fwd: Getting error while upgrading
On Sat, Mar 9, 2024 at 12:18 PM omkar narkar wrote: > Hello Team, > > I am trying to upgrade my edb 10.5 community version to postgres 15.6 > version and while doing this i am getting error regarding OIDS are not > stable across Postgresql version (sys.callback_queue_table.user_data). > Kindly help me to get the solution of this issue. > > Thanks and regards, > Omkar Narkar > Usually get this error if there are composite data types or data types that cannot be translated between 10.5 and 15.6. The clue may be in the error message just before it says 'OIDS are not stable across Postgresql version' You state edb 10.5 community guessing you are using pg_upgrade and going from windows to linux ? I am impressed if you can do that, would you not end up with collation issues? If you are using pg_upgrade what does pg_upgrade --check say I would dump the schema to a sql file pg_dump -s >dumped.sql Then run the sql one command at a time to track down where you are going to have a problem.
Re: Insert with Jsonb column hangs
On Sat, Mar 9, 2024 at 3:02 PM kuldeep singh wrote: > Hi, > > We are inserting data close to 1M record & having a single Jsonb column > but query is getting stuck. > > We are using insert into select * .. , so all the operations are within > the DB. > > If we are running select query individually then it is returning the data > in 40 sec for all rows but with insert it is getting stuck. > > PG Version - 15. > > What could be the problem here ? > > Regards, > KD > insert 1M rows especially JSON that can be large, variable in size and stored as blobs and indexed is not perhaps the correct way to do this insert performance will also depend on your tuning. Supporting transactions, users or bulk processing are 3x sides of a compromise. you should perhaps consider that insert is for inserting a few rows into live tables ... you might be better using copy or \copy, pg_dump if you are just trying to replicate a large table
Re: Insert with Jsonb column hangs
copy syntax can include any valid select statement COPY (any valid select statement joining tables and converting it row_to_json) TO 'some_dump_file' or can copy a view CREATE VIEW myview (any valid select statement joining tables and converting it row_to_json); COPY myview TO 'some_dump_file' Regards Hector Vass 07773 352559 On Sat, Mar 9, 2024 at 4:01 PM kuldeep singh wrote: > Copy may not work in our scenario since we need to join data from multiple > tables & then convert it to json using row_to_json . This json data > eventually needs to be stored in a target table . > > Will it be better if we break the process into batches of like 10,000 rows > & insert the data in its individual transactions? Or any other better > solution available ? > > On Sat, Mar 9, 2024 at 9:01 PM hector vass wrote: > >> >> >> On Sat, Mar 9, 2024 at 3:02 PM kuldeep singh >> wrote: >> >>> Hi, >>> >>> We are inserting data close to 1M record & having a single Jsonb column >>> but query is getting stuck. >>> >>> We are using insert into select * .. , so all the operations are within >>> the DB. >>> >>> If we are running select query individually then it is returning the >>> data in 40 sec for all rows but with insert it is getting stuck. >>> >>> PG Version - 15. >>> >>> What could be the problem here ? >>> >>> Regards, >>> KD >>> >> >> insert 1M rows especially JSON that can be large, variable in size and >> stored as blobs and indexed is not perhaps the correct way to do this >> insert performance will also depend on your tuning. Supporting >> transactions, users or bulk processing are 3x sides of a compromise. >> you should perhaps consider that insert is for inserting a few rows into >> live tables ... you might be better using copy or \copy, pg_dump if you are >> just trying to replicate a large table >> >>
Re: Insert with Jsonb column hangs
On Sat, Mar 9, 2024 at 4:10 PM Adrian Klaver wrote: > On 3/9/24 08:00, kuldeep singh wrote: > > Copy may not work in our scenario since we need to join data from > > multiple tables & then convert it to json using row_to_json . This > > json data eventually needs to be stored in a target table . > > Per: > > https://www.postgresql.org/docs/current/sql-copy.html > > " > COPY { table_name [ ( column_name [, ...] ) ] | ( query ) } > > <...> > > query > > A SELECT, VALUES, INSERT, UPDATE, or DELETE command whose results > are to be copied. Note that parentheses are required around the query. > > For INSERT, UPDATE and DELETE queries a RETURNING clause must be > provided, and the target relation must not have a conditional rule, nor > an ALSO rule, nor an INSTEAD rule that expands to multiple statements. > " > > > > > Will it be better if we break the process into batches of like 10,000 > > rows & insert the data in its individual transactions? Or any other > > better solution available ? > > > > On Sat, Mar 9, 2024 at 9:01 PM hector vass > <mailto:hector.v...@gmail.com>> wrote: > > > > > > > > On Sat, Mar 9, 2024 at 3:02 PM kuldeep singh > > mailto:kuldeeparor...@gmail.com>> wrote: > > > > Hi, > > > > We are inserting data close to 1M record & having a single Jsonb > > column but query is getting stuck. > > > > We are using insert into select * .. , so all the operations are > > within the DB. > > > > If we are running select query individually then it is returning > > the data in 40 sec for all rows but with insert it is getting > stuck. > > > > PG Version - 15. > > > > What could be the problem here ? > > > > Regards, > > KD > > > > > > insert 1M rows especially JSON that can be large, variable in size > > and stored as blobs and indexed is not perhaps the correct way to do > > this > > insert performance will also depend on your tuning. Supporting > > transactions, users or bulk processing are 3x sides of a compromise. > > you should perhaps consider that insert is for inserting a few rows > > into live tables ... you might be better using copy or \copy, > > pg_dump if you are just trying to replicate a large table > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com What Adrian Klaver said ^ discovered even this works... create view myview as (select row_to_json from mytable); create table newtable as select * from myview where 1=0; copy myview to program 'psql mydb postgres -c ''copy newtable from stdin'' ' ;
Re: Please recommend postgresql.conf improvements for osm2pgsql loading Europe
On Sat, 30 Mar 2024, 10:04 Alexander Farber, wrote: > Thank you, Justin - > > On Sat, Mar 30, 2024 at 4:33 AM Justin Clift > wrote: > >> On 2024-03-30 05:53, Alexander Farber wrote: >> > I use the following postgresql.conf in my Dockerfile >> > ( the full version at https://stackoverflow.com/a/78243530/165071 ), >> > when loading a 28 GByte large europe-latest.osm.pbf >> >> Not specific conf file improvements, but for an initial data load >> have you done things like turning off fsync(), deferring index >> creating until after the data load finishes, and that kind of thing? >> > > I will try the following commands in my Dockerfile then > and later report back on any improvements: > > RUN set -eux && \ > pg_ctl init && \ > echo "shared_buffers = 1GB">> $PGDATA/postgresql.conf > && \ > echo "work_mem = 50MB" >> $PGDATA/postgresql.conf > && \ > echo "maintenance_work_mem = 10GB" >> $PGDATA/postgresql.conf > && \ > echo "autovacuum_work_mem = 2GB" >> $PGDATA/postgresql.conf > && \ > echo "wal_level = minimal" >> $PGDATA/postgresql.conf > && \ > echo "checkpoint_timeout = 60min" >> $PGDATA/postgresql.conf > && \ > echo "max_wal_size = 10GB" >> $PGDATA/postgresql.conf > && \ > echo "checkpoint_completion_target = 0.9" >> $PGDATA/postgresql.conf > && \ > echo "max_wal_senders = 0" >> $PGDATA/postgresql.conf > && \ > echo "random_page_cost = 1.0" >> $PGDATA/postgresql.conf > && \ > echo "password_encryption = scram-sha-256" >> $PGDATA/postgresql.conf > && \ > echo "fsync = off">> > $PGDATA/postgresql.conf && \ > pg_ctl start && \ > createuser --username=postgres $PGUSER && \ > createdb --username=postgres --encoding=UTF8 --owner=$PGUSER > $PGDATABASE && \ > psql --username=postgres $PGDATABASE --command="ALTER USER $PGUSER > WITH PASSWORD '$PGPASSWORD';" && \ > psql --username=postgres $PGDATABASE --command='CREATE EXTENSION IF > NOT EXISTS postgis;' && \ > psql --username=postgres $PGDATABASE --command='CREATE EXTENSION IF > NOT EXISTS hstore;' && \ > osm2pgsql --username=$PGUSER --database=$PGDATABASE --create > --cache=6 --hstore --latlong /data/map.osm.pbf && \ > rm -f /data/map.osm.pbf && \ > pg_ctl stop && \ > echo "fsync = on">> > $PGDATA/postgresql.conf && \ > echo '# TYPE DATABASE USER ADDRESS METHOD'> > $PGDATA/pg_hba.conf && \ > echo "local all postgres peer" >> > $PGDATA/pg_hba.conf && \ > echo "local $PGDATABASE $PGUSER scram-sha-256" >> > $PGDATA/pg_hba.conf && \ > echo "host $PGDATABASE $PGUSER 0.0.0.0/0 scram-sha-256" >> > $PGDATA/pg_hba.conf > > The later fsync = on will override the former, right? > > Best regards > Alex > > > 2hrs sounds reasonable for Europe, it's a big place in terms of osm data and osm2pgsql is doing processing to convert to geometry objects prior to doing anything on the Postgresql side. If you examine the --log--sql output for a small test country you can see what it does in terms of the postgresql. osm2pgsql gives options to trim the output to only what you need (so if you don't want waterways, traffic features, parking places or places of worship etc.. why load them) Hopefully you have found the excellent geofabrik https://download.geofabrik.de/ source for osm data. Rather than load this data afresh each update cycle you would be better off simply loading the changes so the .osc files or ... osm osmosis will create the equivalent of a diff file for you Looks like you are already using osm2psql's recommended postgresql.config settings, I'd be surprised if this was way off. Getting as close to tin rather than virtual machines and containers will also help, lots of io going on here. If you are only interested in the geography you might consider geofabrik's shapefile available for many countries, they have already done some of the work for you. Apologies if you are already a long way down this route & just asking about the final stage of loading the osm2pgsql output to Postgresql but however well you do here I would only expect small marginal gains.