Re: [GENERAL] PostgreSQL vs Mongo
If they pull out the "Mongo is faster than PostgreSQL" card, I'll state that I investigated this for my current employer and with roughly equivalent configurations (because it's impossible to get exactly equivalent) I was getting roughly the same performance from each. It's difficult to measure exactly, but I would recommend that if performance is a reason pulling you toward Mongo, that you don't listen to the internet hype and actually test both systems with your workload before assuming one will be faster than the other. On Wed, 16 Oct 2013 20:11:43 -0700 Chris Travers wrote: > On Wed, Oct 16, 2013 at 8:30 AM, CS DBA wrote: > > > All; > > > > One of our clients is talking about moving to Mongo for their > > reporting/data mart. I suspect the real issue is the architecture of their > > data mart schema, however I don't want to start pushing back if I can't > > back it up. > > > > You want to find out what the issues are before making your pitch. Get > specifics. > > > > > Anyone have any thoughts on why we would / would not use Mongo for a > > reporting environment. > > > > what are the use cases where mongo is a good fit? > > > > The argument for NoSQL in reporting is where the variety of data makes > traditional reporting difficult. This is a niche case, and not a typical > data mart. > > > > what are the drawbacks long term? > > > > If you use the wrong tool for the job. you are going to find yourself coded > into corners. The tradeoff is that if you allow data variety on the way > in, you can't ensure simple mathematical transformation of that data to be > meaningful on the way out. This means that the precision of your answers > goes down once you eliminate schemas. Where you don't have to, you should > not go with a NoSQL solution for reporting. > > After all, reporting really is the forte of SQL and has been for a long > time. > > > > is mongo a persistent db or simply a big memory cache? > > does mongo have advantages over Postgres hstore? > > > > I assume Mongo docs can be nested JSON? Also you have some intraquery > parallelism at least between nodes. The latter can be solved with careful > use of Postgres-XC. The former would make XML on PostgreSQL a better > comparison. > > In general these things need to be details-oriented. It is critically > important to find out if they are considering it due to hype or whether > they have real reasons for the comparison. Maybe in some cases, NoSQL > options may be better, but these are relatively rare, particularly in > analytic environments. > > -- > Best Wishes, > Chris Travers > > Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor > lock-in. > http://www.efficito.com/learn_more.shtml -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL vs Mongo
One of the strengths/weaknesses of Mongo are with the similarity between code and access. This simplifies development from a developer's perspective, but complicates from an administrator perspective. If you want an informed opinion, ask the same question on the Mongo mailing list. Also look into for example Apache Cassandra. HTH Kind regards/met vriendelijke groet, Serge Fonville http://www.sergefonville.nl 2013/10/17 Bill Moran > > If they pull out the "Mongo is faster than PostgreSQL" card, I'll state > that > I investigated this for my current employer and with roughly equivalent > configurations (because it's impossible to get exactly equivalent) I was > getting roughly the same performance from each. It's difficult to measure > exactly, but I would recommend that if performance is a reason pulling you > toward Mongo, that you don't listen to the internet hype and actually test > both systems with your workload before assuming one will be faster than the > other. > > On Wed, 16 Oct 2013 20:11:43 -0700 Chris Travers > wrote: > > > On Wed, Oct 16, 2013 at 8:30 AM, CS DBA > wrote: > > > > > All; > > > > > > One of our clients is talking about moving to Mongo for their > > > reporting/data mart. I suspect the real issue is the architecture of > their > > > data mart schema, however I don't want to start pushing back if I can't > > > back it up. > > > > > > > You want to find out what the issues are before making your pitch. Get > > specifics. > > > > > > > > Anyone have any thoughts on why we would / would not use Mongo for a > > > reporting environment. > > > > > > what are the use cases where mongo is a good fit? > > > > > > > The argument for NoSQL in reporting is where the variety of data makes > > traditional reporting difficult. This is a niche case, and not a typical > > data mart. > > > > > > > what are the drawbacks long term? > > > > > > > If you use the wrong tool for the job. you are going to find yourself > coded > > into corners. The tradeoff is that if you allow data variety on the way > > in, you can't ensure simple mathematical transformation of that data to > be > > meaningful on the way out. This means that the precision of your answers > > goes down once you eliminate schemas. Where you don't have to, you > should > > not go with a NoSQL solution for reporting. > > > > After all, reporting really is the forte of SQL and has been for a long > > time. > > > > > > > is mongo a persistent db or simply a big memory cache? > > > does mongo have advantages over Postgres hstore? > > > > > > > I assume Mongo docs can be nested JSON? Also you have some intraquery > > parallelism at least between nodes. The latter can be solved with > careful > > use of Postgres-XC. The former would make XML on PostgreSQL a better > > comparison. > > > > In general these things need to be details-oriented. It is critically > > important to find out if they are considering it due to hype or whether > > they have real reasons for the comparison. Maybe in some cases, NoSQL > > options may be better, but these are relatively rare, particularly in > > analytic environments. > > > > -- > > Best Wishes, > > Chris Travers > > > > Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor > > lock-in. > > http://www.efficito.com/learn_more.shtml > > > -- > Bill Moran > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] pg_similarity
I tried the installation as suggested at http://pgsimilarity.projects.pgfoundry.org/ after make install command we have run following command:- psql -f SHAREDIR/contrib/pg_similarity.sql mydb Here SHAREDIR is /usr/local/pgsql/share/extension/ under this directory we can see file pg_similarity.sql and mydb is your database name. so provide correct path and database then pg_similarity will create it's classes,functions and operators etc. after that you can load pg_similarity using SQL command load 'pg_similarity'; or copy a sample file at tarball (pg_similarity.conf.sample) to PGDATA (as pg_similarity.conf) and include the following line in postgresql.conf:- include 'pg_similarity.conf' - Thanks and Regards, Sachin Kotwal NTT-DATA-OSS Center (Pune) -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-similarity-tp5774125p5774835.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] C-language stored function and float4 type
How you checked result type? Can you explain in details? - Thanks and Regards, Sachin Kotwal NTT-DATA-OSS Center (Pune) -- View this message in context: http://postgresql.1045698.n5.nabble.com/C-language-stored-function-and-float4-type-tp5773493p5774840.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Index creation fails with automatic names
Hi all, I am not sure if this is a bug or a misuse on my part. I am creating a number of indices in parallel on a table by using xargs. To do that, I write all my indices in a file indices.idx, and then have the indices build in parallel (in this case with 5 concurrent processes) cat indices.idx | xargs -P5 -I# psql -1 -c '#' indices.idx contains lines like this: ALTER TABLE schema.table1 ADD CONSTRAINT pk_activity PRIMARY KEY (field_sk); CREATE INDEX ON schema.table1 ((LOWER(field2))); CREATE INDEX ON schema.table1 ((LOWER(field3))); CREATE INDEX ON schema.table1 (field4, field5); CREATE INDEX ON schema.table1 (field4, field6, field5); Upon running the above command, I see the following error: ALTER TABLE CREATE INDEX ERROR: duplicate key value violates unique constraint "pg_class_relname_nsp_index" DETAIL: Key (relname, relnamespace)=(table1_lower_idx, 2064404) already exists. My question is then - where does this error come from? Is is because Postgres allocates the same name (table1_lower_idx) twice when the index begins building, because at that time there's no index present with that name? But if one index finishes earlier, then the second one can't be committed because it has the same name as an already present index? Any clarifications would be greatly appreciated! Best, Flo P.S. Please CC me, I am not currently subscribed. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Idle transactions in PostgreSQL 9.2.4
Hi all, I have some idle transactions in PostgreSQL 9.2.4 server which never end. My application was working fine on version 9.1 (BSD) but the problem appeared immediately as we ported it to 9.2.4 on Linux. The idle operations would frequently appear as COMMITs but sometimes I observed INSERTS as well. Please note that each commit below refers to *very light* inserts of single rows which are supposed to last milliseconds. However some of the just never end even if they are not waiting on other operations. See below: SELECT datname,usename,pid,client_addr,waiting,query_start,query FROM pg_stat_activity; datname | usename | pid | client_addr | waiting | query_start |query --+---+---+-+-+--- AppQoSDB | appqosusr | 17015 | 127.0.0.1 | f | 2013-10-15 10:21:38.501435+01 | select AppQoSConfig.get_ip_currentnode() AppQoSDB | appqosusr | 17016 | 127.0.0.1 | f | 2013-10-15 10:21:38.502346+01 | COMMIT AppQoSDB | appqosusr | 17017 | 127.0.0.1 | f | 2013-10-15 10:21:38.584794+01 | select AppQoSConfig.get_ip_currentnode() AppQoSDB | appqosusr | 17018 | 127.0.0.1 | f | 2013-10-15 10:21:38.586073+01 | COMMIT AppQoSDB | appqosusr | 17019 | 127.0.0.1 | f | 2013-10-15 09:20:32.724517+01 | select AppQoSConfig.get_ip_currentnode() AppQoSDB | appqosusr | 17021 | 127.0.0.1 | f | 2013-10-15 10:21:38.622651+01 | select AppQoSConfig.get_ip_currentnode() AppQoSDB | appqosusr | 17020 | 127.0.0.1 | f | 2013-10-15 09:20:32.724433+01 | select AppQoSConfig.get_ip_currentnode() AppQoSDB | appqosusr | 17022 | 127.0.0.1 | f | 2013-10-15 10:21:37.368831+01 | select AppQoSConfig.get_ip_currentnode() AppQoSDB | appqosusr | 17024 | 127.0.0.1 | f | 2013-10-15 09:20:32.828307+01 | COMMIT AppQoSDB | appqosusr | 17026 | 127.0.0.1 | f | 2013-10-15 10:21:38.624378+01 | COMMIT AppQoSDB | appqosusr | 17023 | 127.0.0.1 | f | 2013-10-15 09:20:32.828302+01 | COMMIT AppQoSDB | appqosusr | 17025 | 127.0.0.1 | f | 2013-10-15 10:21:37.369869+01 | COMMIT AppQoSDB | appqosusr | 17027 | 127.0.0.1 | f | 2013-10-15 10:21:38.633244+01 | SELECT datname,usename,pid,client_addr,waiting,query_start,query FROM pg_stat_activity; This is the current configuration: name | current_setting --+ version | PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 64-bit autovacuum | on autovacuum_analyze_threshold | 50 autovacuum_max_workers | 1 autovacuum_naptime | 1h autovacuum_vacuum_threshold | 50 checkpoint_completion_target | 0.7 checkpoint_segments | 128 checkpoint_warning | 30s client_encoding | UTF8 effective_cache_size | 16GB lc_collate | en_GB.UTF-8 lc_ctype | en_GB.UTF-8 listen_addresses | * log_destination | syslog, stderr log_min_duration_statement | 2min log_rotation_age | 10d log_rotation_size| 100MB logging_collector| on max_connections | 200 max_stack_depth | 2MB server_encoding | UTF8 shared_buffers | 6GB synchronous_commit | off TimeZone | GB wal_buffers | 128kB work_mem | 18MB Thank you, Svetlin Manavski
[GENERAL] Remove or alter the default access privileges of the public schema by the database owner
Hi everyone I'm looking for a way to let a role which created a new database (is the database owner) change (remove) the default access privileges of the public schema, which allows everyone to use and create objects within this schema. I do not want to give the role the SUPERUSER option. List of schemas Name | Owner | Access privileges | Description +--+--+ public | postgres | postgres=UC/postgres+| standard public schema | | =UC/postgres | Basically, I have an administrative role () with CREATEROLE, CREATEDB and NOSUPERUSER options set. This role needs to be able to to the following: CREATE DATABASE "" OWNER "" ENCODING...; REVOKE ALL ON DATABASE "" FROM PUBLIC; GRANT CONNECT, TEMPORARY ON DATABASE "" TO ""; GRANT ALL ON SCHEMA public TO ""; REVOKE ALL ON SCHEMA public FROM PUBLIC; GRANT USAGE ON SCHEMA public TO ""; ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO ""; All the grants/revoks on the public schema fail because the role is not the owner of the public schema. To circumvent this I've tried the following: Created a new template database (as a superuser) which the owner of the public schema set to the and use this template for all database creations. This solves the grant/revoke problem on the public schema, but the role is unable to create databases with different collation settings (new collation (...) is incompatible with the collation of the template database). And there are a lot of different collation settings needed. Dropping the public schema beforehand on the template is also no option as I have to use template0. Is there a way to either let the owner of a database own the public schema by default, or to ignore the collation settings on the template database (it will never have any data preloaded, only the ownership of the public schema changed)? Or maybe there is a complete other approach to solve this problem. Many thanks and best regards Christian -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Help function to sort string
Hi everyone, Please hepl me this function: 1. I want to sort string follow anphabet and I used this query: select string_agg(x, ';') from (select trim(unnest(regexp_split_to_array('ECD FORM; BODY; PREDILUTED; CHROMO-GENIC; AUTO;RABBIT; FORMAT',';'))) x order by x) a; -- result: AUTO; BODY; CHROMOGENIC; ECD FORM; FORMAT; PREDILUTED; RABBIT -->I expected this rusult In my database I have a column with alot of rows data. I want that query become a function to more easy to using. But I can not write a function :(. please hepl me. For example, I have column "data_text" with data like this: Row 1: AUTO; BODY; PREDILUTED; ECD FORM; RABBIT; FORMAT; CHROMOGENIC Row 2: ECD FORM; BODY; PREDILUTED; CHROMO-GENIC; AUTO; RABBIT; FORMAT Row 3: FORMAT; ECD FORM; AUTO Row 3: ANHYDROUS; DENATURED; PREDILUTED; CHROMOGENIC When I run funtion, the result: Row 1: AUTO; BODY; CHROMOGENIC; ECD.FORM; FORMAT; PREDILUTED; RABBIT Row 2: AUTO; BODY; CHROMO-GENIC; ECD FORM; FORMAT; PREDILUTED; RABBIT Row 3: AUTO; ECD FORM; FORMAT Row 4: ANHYDROUS; CHROMOGENIC; DENATURED; PREDILUTED Thank you and best regards, -- View this message in context: http://postgresql.1045698.n5.nabble.com/Help-function-to-sort-string-tp5774638.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] C-language stored function and float4 type
Sorry, I created an overloaded function and forgot about it. whiplash writes: Hello! I have C-language stored function like this: CREATE FUNCTION testfunc ( x real, y real ) RETURNS real AS 'testfunc' LANGUAGE C STRICT; If i use this function (SELECT testfunc ( 0.1, 0.2 )) I get result with type "double precision". What's your basis for asserting that? The function is clearly defined to return float4. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Missing record in binary replica 9.3.0
All, I found something in the following setup: I have 3 DBS db01 (9.3.0) master db02 (9.3.0) replica same network db03 (9.3.0) sync replica another network 80ms away. db01 have around 30 - 300 records per second update. At some point today I found there is a missing record in both replica. Data after that point has been replicated though db01 select * from transaction.account where client_detail_id ='163042'; id| client_detail_id | last_modified -++---+--+ 1736861 | 163042 | 2013-10-17 10:39:18.312815 in both other dbs no record. Does anyone has any idea how could this happen? then in db01 I did update last_modified field for that record and suddenly new record and new last_modified values appeared in both replicas. Regards
Re: [GENERAL] Index creation fails with automatic names
On Mon, Oct 14, 2013 at 6:31 AM, Florian Nigsch wrote: > My question is then - where does this error come from? Is is because > Postgres allocates the same name (table1_lower_idx) twice when the index > begins building, because at that time there's no index present with that > name? But if one index finishes earlier, then the second one can't be > committed because it has the same name as an already present index? > > looks like the auto-generated names for your indexes clash. Give them explicit names. Also, subscribe to the list if you want help. this is not your private help desk.
Re: [GENERAL] Index creation fails with automatic names
2013/10/14 Florian Nigsch : > Hi all, > > I am not sure if this is a bug or a misuse on my part. > > I am creating a number of indices in parallel on a table by using xargs. To > do that, I write all my indices in a file indices.idx, and then have the > indices build in parallel (in this case with 5 concurrent processes) > > cat indices.idx | xargs -P5 -I# psql -1 -c '#' > > indices.idx contains lines like this: > > ALTER TABLE schema.table1 ADD CONSTRAINT pk_activity PRIMARY KEY (field_sk); > > CREATE INDEX ON schema.table1 ((LOWER(field2))); > CREATE INDEX ON schema.table1 ((LOWER(field3))); > CREATE INDEX ON schema.table1 (field4, field5); > CREATE INDEX ON schema.table1 (field4, field6, field5); > > > Upon running the above command, I see the following error: > > ALTER TABLE > CREATE INDEX > ERROR: duplicate key value violates unique constraint > "pg_class_relname_nsp_index" > DETAIL: Key (relname, relnamespace)=(table1_lower_idx, 2064404) already > exists. > > My question is then - where does this error come from? Is is because > Postgres allocates the same name (table1_lower_idx) twice when the index > begins building, because at that time there's no index present with that > name? But if one index finishes earlier, then the second one can't be > committed because it has the same name as an already present index? It works fine for me on Pg 9.3.1: postgres=# CREATE TABLE foo(val1 text, val2 text); CREATE TABLE postgres=# CREATE INDEX on foo((lower(val1))); CREATE INDEX postgres=# CREATE INDEX on foo((lower(val2))); CREATE INDEX postgres=# \d foo Table "public.foo" Column | Type | Modifiers +--+--- val1 | text | val2 | text | Indexes: "foo_lower_idx" btree (lower(val1)) "foo_lower_idx1" btree (lower(val2)) Which PostgreSQL version are you using? Are you sure there's not an index with the offending name already? Regards Ian Barwick -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Missing record in binary replica 9.3.0
All, I found something in the following setup: I have 3 DBS db01 (9.3.0) master db02 (9.3.0) replica same network db03 (9.3.0) sync replica another network 80ms away. db01 have around 30 - 300 records per second update. At some point today I found there is a missing record in both replica. Data after that point has been replicated though db01 select * from transaction.account where client_detail_id ='163042'; id| client_detail_id | last_modified -++---+--+ 1736861 | 163042 | 2013-10-17 10:39:18.312815 in both other dbs no record. Does anyone has any idea how could this happen? then in db01 I did update last_modified field for that record and suddenly new record and new last_modified values appeared in both replicas. Regards
Re: [GENERAL] Remove or alter the default access privileges of the public schema by the database owner
Hello Please accept my apologies for the double posting. The original mail was held off by the Majordomo mailing list software until a mailing list administrator would allow it to be delivered. Majordomo doesn't like the string 'remove' within the subject. Thereupon, I informed the mailing list admins that I will resend the message with a new subject and that they can safely delete the original message. Apparently, my notice got overlooked somehow. Sorry again. Christian On 15.10.2013 12:28, Christian Affolter wrote: Hi everyone I'm looking for a way to let a role which created a new database (is the database owner) change (remove) the default access privileges of the public schema, which allows everyone to use and create objects within this schema. I do not want to give the role the SUPERUSER option. [...] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index creation fails with automatic names
On Mon, Oct 14, 2013 at 5:31 AM, Florian Nigsch wrote: > Hi all, > > I am not sure if this is a bug or a misuse on my part. > > I am creating a number of indices in parallel on a table by using xargs. To > do that, I write all my indices in a file indices.idx, and then have the > indices build in parallel (in this case with 5 concurrent processes) > > cat indices.idx | xargs -P5 -I# psql -1 -c '#' > > indices.idx contains lines like this: > > ALTER TABLE schema.table1 ADD CONSTRAINT pk_activity PRIMARY KEY (field_sk); > > CREATE INDEX ON schema.table1 ((LOWER(field2))); > CREATE INDEX ON schema.table1 ((LOWER(field3))); > CREATE INDEX ON schema.table1 (field4, field5); > CREATE INDEX ON schema.table1 (field4, field6, field5); > > > Upon running the above command, I see the following error: > > ALTER TABLE > CREATE INDEX > ERROR: duplicate key value violates unique constraint > "pg_class_relname_nsp_index" > DETAIL: Key (relname, relnamespace)=(table1_lower_idx, 2064404) already > exists. > > My question is then - where does this error come from? Is is because > Postgres allocates the same name (table1_lower_idx) twice when the index > begins building, because at that time there's no index present with that > name? But if one index finishes earlier, then the second one can't be > committed because it has the same name as an already present index? > > Any clarifications would be greatly appreciated! hm. what happens when you set transaction isolation to serializable? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] COPY table FROM STDIN doesn't show count tag
According to the docs, the output of a copy command should return a COPY count. It turns out this only happens when copying from a file, and not from STDIN. > Outputs > On successful completion, a COPY command returns a command tag of the form > COPY count > The count is the number of rows copied. > clone_test_rename=# create temp table aaa (field varchar(255)); CREATE TABLE Time: 3.806 ms clone_test_rename=# copy aaa from '/tmp/test.dat'; COPY 3 Time: 0.623 ms clone_test_rename=# copy aaa from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 123 >> 456 >> hello >> \. Time: 8472.305 ms Tested on 9.3.0 Is there another way I can determine the number of rows copied, short of performing a pre and post count? (other than not using stdin, that is).
Re: [GENERAL] Idle transactions in PostgreSQL 9.2.4
2013/10/15 Svetlin Manavski > I have some idle transactions in PostgreSQL 9.2.4 server which never end. > My application was working fine on version 9.1 (BSD) but the problem > appeared immediately as we ported it to 9.2.4 on Linux. The idle operations > would frequently appear as COMMITs but sometimes I observed INSERTS as well. > Please note that each commit below refers to *very light* inserts of > single rows which are supposed to last milliseconds. However some of the > just never end even if they are not waiting on other operations. See below: > > SELECT datname,usename,pid,client_addr,waiting,query_start,query FROM > pg_stat_activity; > In 9.2 there's an extra column in the view that you're missing in your query: state. If state='active', then `query` shows _currently running_ query. Otherwise it shows _last query_ executed by the session. Check here: http://www.postgresql.org/docs/9.2/static/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW -- Victor Y. Yegorov
[GENERAL] Strange results with pg_restore
Have a good day. This friday i'm going to migrate an entire database of a government institution in my country. Those are like 4 database of 2GB each one. So, i was preparing about what i'm going to do tomorrow. They currently have PostgreSQL 9.1 installed from source i think, and they will update RHEL version to 5.7, it will be a reinstall of operative system, so before that i will backup the databases, copy contents of pg_hba.conf and so on, and after the reinstallation of RHEL i'm going to install PostgreSQL 9.3 from official repositories alongside contrib and plpython. I've already tested that on a Centos 5.7 VM and it works fine. But well, yesterday i was practicing creation of backups and restoring backups in my laptop with the next specs: Core i7 2.2 Ghz 8GB RAM 30GB empty space Windows 8.1 In my laptop i also have PostgreSQL 9.3. First i used the sample database pgdellstore, what i did is that i created a database, and ran the SQL file of the database. After that, i created a backup of that database pgdellstore with pg_dump, a custom format, without compression: pg_dump -h localhost -p 5432 -U postgres -W -Fc -Z0 -C -d pgdellstore > pgdellstore.backup And everything as normal. After that i created a second database and restored my custom backup with pg_restore trying to speed up using -j option: pg_restore -h localhost -p 5432 -U postgres -W -d pgdellstore -j4 pgdellstore.backup Also, before that, i backed up my postgresql.conf and edited the current one with the next parameters, according to some research that i did in google about performance tuning of pg_restore: shared_buffers=2GB maintenance_work_mem=1GB work_mem=128MB wal_buffers=16MB checkpoint_segments=8 autovacuum=off archive_mode=off fsync=off full_page_writes=off checkpoint_timeout=15min checkpoint_completion_target=0.9 track_counts=off synchronous_commit=off bgwriter_delay=50ms And it restored it in 2 seconds i think, for me it was extremely fast (is a database of 25MB aproximately) and i had doubts, so i have some queries to check database tables size and i compared the results of the first database (the one that i created with original sql script) and the second one (the one that i created using a custom backup of the first one) and i got this: DB created with original SQL script: schemaname | tablename | reltuples | tamanio | tamanioord +--+---++ public | customers| 2 | 5016 kB|5136384 public | orderlines | 60350 | 4440 kB|4546560 public | cust_hist| 60350 | 3976 kB|4071424 public | products | 1 | 1552 kB|1589248 public | orders | 12000 | 1384 kB|1417216 public | inventory| 1 | 704 kB | 720896 public | categories |16 | 24 kB | 24576 public | afiliado | 4 | 24 kB | 24576 public | pruebafechas | 0 | 8192 bytes | 8192 public | reorder | 0 | 0 bytes| 0 (10 filas) DB created with custom backup based on first db schemaname | tablename | reltuples | tamanio | tamanioord +--+---++ public | customers| 2 | 4992 kB|5111808 public | orderlines | 60350 | 4416 kB|4521984 public | cust_hist| 60350 | 3952 kB|4046848 public | products | 1 | 1528 kB|1564672 public | orders | 12000 | 1360 kB|1392640 public | inventory| 1 | 680 kB | 696320 public | afiliado | 4 | 24 kB | 24576 public | categories |16 | 24 kB | 24576 public | pruebafechas | 0 | 8192 bytes | 8192 public | reorder | 0 | 0 bytes| 0 (10 filas) This is the query that i used to get those results: SELECT tbl.schemaname, tbl.tablename, obj.reltuples , pg_size_pretty(pg_total_relation_size(text('"' || tbl.schemaname || '"."' || tbl.tablename || '"'))) tamanio, pg_total_relation_size(text('"' || tbl.schemaname || '"."' || tbl.tablename || '"')) tamanioord FROM pg_tables tbl, pg_class obj WHERE tbl.tablename = obj.relname AND tbl.schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY tamanioord DESC; As you can see, there are certain variations in size, the number of tuples remains, but the fields tamanio (formatted size) and tamanioord (size without format) have a difference, for example the table customers has 5016 kB versus 4992 kB . I'm afraid that in someway i lost data or something like that, or according to your experience that can be normal and doesn't mean that information is corrupted or something, because tomorrow when i will do that with productions databases it will be a chaos if information gets corrupted. Also, my second concern is the next: 4 months ago, i did a
[GENERAL] day_trunc and day duration in a remote time zone
Hello, I want to retrieve the day start and duration of an epoch within a given time zone and return the day start as epoch. the queries below works as expected, but I would appreciate a simpler solution ... example: ( http://localtimes.info/Europe/Cyprus/Nicosia/ ) select (TIMESTAMP WITH TIME ZONE 'epoch' + interval '1382886336 seconds') AT TIME ZONE 'Europe/Nicosia' 2013-10-27 17:05:36 SELECT EXTRACT (epoch FROM cast(date( (TIMESTAMP WITH TIME ZONE 'epoch' + interval '1382886336 seconds' ) AT TIME ZONE 'Europe/Nicosia') || ' 00:00 Europe/Nicosia' as timestamptz)) as day_start => 1382821200 SELECT ( EXTRACT (epoch FROM cast(date( (TIMESTAMP WITH TIME ZONE 'epoch' + interval '1382983536 seconds' /* added 27 hours to the previous day_start result */ ) AT TIME ZONE 'Europe/Nicosia') || ' 00:00 Europe/Nicosia' as timestamptz)) - 1382821200 /* = day_start*/ )/3600 as hour_duration => 25 hours, which is correct as the daylight saving time ends at this date regards, Marc Mamin
Re: [GENERAL] day_trunc and day duration in a remote time zone
On 10/17/2013 08:47 AM, Marc Mamin wrote: Hello, I want to retrieve the day start and duration of an epoch within a given time zone and return the day start as epoch. Not quite sure what you are looking for, but something like: hplc=> select current_date::timestamp, extract(epoch from current_date::timestamp); timestamp | date_part -+ 2013-10-17 00:00:00 | 1381968000 regards, Marc Mamin -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL vs Mongo
On Wed, Oct 16, 2013 at 09:30:59AM -0600, CS DBA wrote: - All; - - One of our clients is talking about moving to Mongo for their - reporting/data mart. I suspect the real issue is the architecture of - their data mart schema, however I don't want to start pushing back if I - can't back it up. - - Anyone have any thoughts on why we would / would not use Mongo for a - reporting environment. - - what are the use cases where mongo is a good fit? - what are the drawbacks long term? - is mongo a persistent db or simply a big memory cache? - does mongo have advantages over Postgres hstore? - etc... - - Thanks in advance... - - /Kevin I work with both. Mongo doesn't really seem approprite for a datamart. Mongo supports Map Reduce and has an Aggregation framework (which will give you a lot of the functionality of SQL but is much more esoteric) You need an index for every query you run and every possibly sort order. Mongo will cancel you're query if the result set hits a certian size w/o an index. Doing ad-hoc queries is HARD. and there are no joins. If it's not in your document you basically have to pull both documents into your app and join them by hand. Writes block reads, massive updates (like into a datamart) will need to "yield" to allow reads to happen and that only happens at a pagefault. You need to have enough memory to store you're "working set". or performance tanks In a datamart your working set is frequently the whole thing. People throw around the "Schemaless" thing, but really there is some schema. you have to know what you want your document to look like. So this means schema changes as you grow your product, etc. In a datamart you're not going to use 10gen's idea schema change methodology of "Only Apply Data Model Changes when you access a record" That works if you're ooking up a single document at a time, but not if you're mostly doing range scans and aggregations. Mongo is very limited on how it can sort, we have a number of "sort fields" added to our document that give us a different indexable sort order. like you can't do ORDER BY CASE statements. IMO Mongo, like most NoSQL solutons, address write scaling and availablity by making it easier to do. You can generally shard w/o bothering the application too much and you get free seamless failover with the replica sets. Hope this is helpful Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index creation fails with automatic names
Merlin Moncure wrote: > On Mon, Oct 14, 2013 at 5:31 AM, Florian Nigsch wrote: >> I am creating a number of indices in parallel on a table by using xargs. To >> do that, I write all my indices in a file indices.idx, and then have the >> indices build in parallel (in this case with 5 concurrent processes) >> >> cat indices.idx | xargs -P5 -I# psql -1 -c '#' >> >> indices.idx contains lines like this: >> >> ALTER TABLE schema.table1 ADD CONSTRAINT pk_activity PRIMARY KEY > (field_sk); >> >> CREATE INDEX ON schema.table1 ((LOWER(field2))); >> CREATE INDEX ON schema.table1 ((LOWER(field3))); >> CREATE INDEX ON schema.table1 (field4, field5); >> CREATE INDEX ON schema.table1 (field4, field6, field5); >> >> >> Upon running the above command, I see the following error: >> >> ALTER TABLE >> CREATE INDEX >> ERROR: duplicate key value violates unique constraint >> "pg_class_relname_nsp_index" >> DETAIL: Key (relname, relnamespace)=(table1_lower_idx, 2064404) already >> exists. >> >> My question is then - where does this error come from? Is is because >> Postgres allocates the same name (table1_lower_idx) twice when the index >> begins building, because at that time there's no index present with that >> name? But if one index finishes earlier, then the second one can't be >> committed because it has the same name as an already present index? I'm going to go along with the suggestion that you explicitly name them when you create the indices.idx file. When these all start together, they probably cannot see each others' catalog entries, and so don't think they are choosing duplicate names. > hm. what happens when you set transaction isolation to > serializable? I would not expect that to help; since system tables weren't using MVCC snapshots when SSI was implemented, they were excluded from serializable behavior. It might be worth revisiting that now that we have MVCC catalog access, but in this case it would just replace one type of error with another. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index creation fails with automatic names
Ian Lawrence Barwick wrote: > It works fine for me on Pg 9.3.1: > > postgres=# CREATE TABLE foo(val1 text, val2 text); > CREATE TABLE > postgres=# CREATE INDEX on foo((lower(val1))); > CREATE INDEX > postgres=# CREATE INDEX on foo((lower(val2))); > CREATE INDEX You seem to be creating the indexes one at a time, all on the same connection. The OP's problem occurs when starting five CREATE INDEX statements in five different sessions all at the same time. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] when do I analyze after concurrent index creation?
Hi, I have a very basic question. If I create index concurrently, then do I need to analyze the table? If yes, when? Please let me know. Thanks.
Re: [GENERAL] Need some help on Performance 9.0.4
On 17.10.2013 20:56, akp geek wrote: > got the output from depesz and this the top on the system. thanks for > the help .. >From depesz? That site works differently - you enter the explain plan into the form, it does some formatting/highlighting and you're supposed to submit the link to that page. E.g. http://explain.depesz.com/s/JwTB is for your query. However it seems you've done some anonymization on the query, e.g. replacing the table/index/function names and string literals with some random words. Anyway, the plan shows the estimates are reasonably accurate, but the index scan on romeo_four consumes ~99% of the query duration. Not sure if it can be improved, because we don't know the original query and the anonymization made it rather annoying to deal with. > > Regards > > load averages: 30.3, 28.8, 27.8;up > 680+00:51:09 > > 18:24:44 > 156 processes: 123 sleeping, 33 on cpu > CPU states: 49.5% idle, 50.0% user, 0.4% kernel, 0.0% iowait, 0.0% swap > Memory: 64G phys mem, 19G free mem, 63G total swap, 63G free swap > >PID USERNAME LWP PRI NICE SIZE RES STATETIMECPU COMMAND > 7965 postgres 1 200 3178M 3171M cpu/7 78:21 1.58% postgres > 20638 postgres 1 200 3181M 3175M cpu/28 187:01 1.58% postgres > 22819 postgres 1 300 3181M 3175M cpu/33 264:06 1.57% postgres > 3789 postgres 1 200 3183M 3176M cpu/9 626:11 1.57% postgres > 10375 postgres 1 300 3182M 3175M cpu/50 293:55 1.57% postgres > 27335 postgres 1 300 3175M 3169M cpu/29 27:27 1.57% postgres > 7959 postgres 1 300 3178M 3171M cpu/41 104:07 1.57% postgres > 8156 postgres 1 300 3180M 3173M cpu/43 124:18 1.56% postgres > 20640 postgres 1 00 3182M 3176M cpu/58 207:49 1.56% postgres > 10584 postgres 1 300 3179M 3173M cpu/35 76:32 1.56% postgres > 13984 postgres 1 300 3181M 3174M cpu/30 207:04 1.56% postgres > 3575 postgres 1 200 3283M 3261M cpu/19 1:48 1.56% postgres > 7964 postgres 1 400 3177M 3170M cpu/62 82:56 1.56% postgres > 1299 postgres 1 400 3166M 3161M cpu/52 5:11 1.56% postgres > 27692 postgres 1 00 3181M 3174M cpu/46 260:58 1.56% postgres Once again, this top output is incomplete - all it lists are postgres processes (and it's unclear what they're doing), but there are certainly some other processes running. Moreover, those 15 processes account for only ~25% of CPU, but there's almost 50% CPU utilized. So where's the rest? Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How do I create a box from fields in a table?
I need to determine whether a given pair of coordinates is inside a given rectangle. According to the documentation, PostgreSQL provides the box and point types and a "contains" operator that will be perfect for this. However, the example provided in the documentation only shows the creation of boxes and points from constant values. So, the following query works: select tran_car_identification, box '((0, 1), (2, 3))' from transfer_car But if I try to use a field from the transfer_car table, it doesn't work: select tran_car_identification, box '((location, 1), (2, 3))' from transfer_car That gives me an "invalid input syntax for type box" error. How do I create a box object using data from the transfer_car table? Thanks very much! RobR -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How do I create a box from fields in a table?
On Thu, Oct 17, 2013 at 4:04 PM, Rob Richardson wrote: > I need to determine whether a given pair of coordinates is inside a given > rectangle. According to the documentation, PostgreSQL provides the box and > point types and a "contains" operator that will be perfect for this. > However, the example provided in the documentation only shows the creation of > boxes and points from constant values. So, the following query works: > > select tran_car_identification, box '((0, 1), (2, 3))' from transfer_car > > But if I try to use a field from the transfer_car table, it doesn't work: > > select tran_car_identification, box '((location, 1), (2, 3))' from > transfer_car > > That gives me an "invalid input syntax for type box" error. > > How do I create a box object using data from the transfer_car table? you have to construct the string. this is somewhat baroque by modern postgres standards but should work: select tran_car_identification, format('((%s, 1), (2, 3))', location)::box from transfer_car; merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
Hi, I've got a question of converting database from ascii to UTF-8, what's the best approach to do so if the database size is very large? Detailed procedure or experience sharing are much appreciated! Thanks, Suya
Re: [GENERAL] werid error"index row size 3040 exceeds btree maximum, 2712" occur randomly
Thanks Tomas! I'll spend some time on the link you sent, new learner of Postgres :-) Thanks, Suya -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Tomas Vondra Sent: Wednesday, October 16, 2013 9:06 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] werid error"index row size 3040 exceeds btree maximum, 2712" occur randomly On 15.10.2013 03:44, Huang, Suya wrote: > Thanks Tomas! > > However, in the example I sent, I already did a vacuum full right > after deleted the rows causing problem, before created the index and > got an error even the table is vacuumed. Note, the table is I > temporarily created using create table as select *... so no other > people is accessing that table, except me for the testing purpose. > > Any ideas? And today, while I did the same thing, I can create index > on the table right after I deleted the rows causing problem, without > vacuum. > > Anything I missed here? Not sure. The only thing I can think of is another transaction preventing the autovacuum from removing the rows, but I can't reproduce it. What PostgreSQL version are you using? BTW take a look at this contrib module: http://www.postgresql.org/docs/9.1/static/pageinspect.html It allows looking directly on items on a data page like this: select * from heap_page_items(get_raw_page('test', 0)); Maybe that'll help you identify the problem. Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index creation fails with automatic names
2013/10/18 Kevin Grittner : > Ian Lawrence Barwick wrote: > >> It works fine for me on Pg 9.3.1: >> >> postgres=# CREATE TABLE foo(val1 text, val2 text); >> CREATE TABLE >> postgres=# CREATE INDEX on foo((lower(val1))); >> CREATE INDEX >> postgres=# CREATE INDEX on foo((lower(val2))); >> CREATE INDEX > > You seem to be creating the indexes one at a time, all on the same > connection. The OP's problem occurs when starting five CREATE > INDEX statements in five different sessions all at the same time. (reads original email again) ah yes, brain was not properly engaged. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
On Oct 17, 2013, at 3:13 PM, "Huang, Suya" wrote: > Hi, > > I’ve got a question of converting database from ascii to UTF-8, what’s the > best approach to do so if the database size is very large? Detailed procedure > or experience sharing are much appreciated! > The answer to that depends on what you mean by "ascii". If your current database uses SQL_ASCII encoding - that's not ascii. It could have anything in there, including any mix of encodings and there's been no enforcement of any encoding, so there's no way of knowing what they are. If you've had, for example, webapps that let people paste word documents into them, you potentially have different encodings used in different rows of the same table. If your current data is like that then you're probably looking at doing some (manual) data cleanup to work out what encoding your data is really in, and converting it to something consistent rather than a simple migration from ascii to utf8. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
On 10/17/2013 3:13 PM, Huang, Suya wrote: I’ve got a question of converting database from ascii to UTF-8, what’s the best approach to do so if the database size is very large? Detailed procedure or experience sharing are much appreciated! I believe you will need to dump the whole database, and import it into a new database that uses UTF8 encoding. Ss far as I know, there's no way to convert encoding in place. As the other gentlemen pointed out, you also will have to convert/sanitize all text data, as your current SQL_ASCII fields could easily contain stuff that's not valid UTF8. for large databases, this is a major undertaking. I find its often easiest to do a major change like this between the old and a new database server. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
Thanks Steve, Yes, we're using SQL_ASCII. Would you please be more specific about manual data cleanup work here? I'm new to Postgres and don't have any experience in character set conversion before, so any specific experience shared would be very much appreciated. Thanks, Suya -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Steve Atkins Sent: Friday, October 18, 2013 11:08 AM To: pgsql-general@postgresql.org General Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ? On Oct 17, 2013, at 3:13 PM, "Huang, Suya" wrote: > Hi, > > I've got a question of converting database from ascii to UTF-8, what's the > best approach to do so if the database size is very large? Detailed procedure > or experience sharing are much appreciated! > The answer to that depends on what you mean by "ascii". If your current database uses SQL_ASCII encoding - that's not ascii. It could have anything in there, including any mix of encodings and there's been no enforcement of any encoding, so there's no way of knowing what they are. If you've had, for example, webapps that let people paste word documents into them, you potentially have different encodings used in different rows of the same table. If your current data is like that then you're probably looking at doing some (manual) data cleanup to work out what encoding your data is really in, and converting it to something consistent rather than a simple migration from ascii to utf8. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
On 10/17/2013 08:51 PM, Huang, Suya wrote: Thanks Steve, Yes, we're using SQL_ASCII. Would you please be more specific about manual data cleanup work here? I'm new to Postgres and don't have any experience in character set conversion before, so any specific experience shared would be very much appreciated. This is a good place to start: http://www.postgresql.org/docs/9.3/interactive/multibyte.html Thanks, Suya -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
Forgot to mention, we're using a very old version which is 8.3.11. I'll take a look at the guide for 8.3 with similar section. Thanks Adrian! -Original Message- From: Adrian Klaver [mailto:adrian.kla...@gmail.com] Sent: Friday, October 18, 2013 3:05 PM To: Huang, Suya; Steve Atkins; pgsql-general@postgresql.org General Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ? On 10/17/2013 08:51 PM, Huang, Suya wrote: > Thanks Steve, > > Yes, we're using SQL_ASCII. > > Would you please be more specific about manual data cleanup work here? I'm > new to Postgres and don't have any experience in character set conversion > before, so any specific experience shared would be very much appreciated. This is a good place to start: http://www.postgresql.org/docs/9.3/interactive/multibyte.html > > Thanks, > Suya > -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
Yes John, we probably will use a new database server here to accommodate those converted database. By saying export/import, do you mean by : 1. pg_dump (//should I specify -E UTF 8 to dump the data in UTF-8 encoding?) 2. create database xxx -E UTF8 3. pg_restore I also see someone's doing this by the following way: 1. perform a plain text dump of database. pg_dump -f db.sql [dbname] 2. convert the character encodings. iconv db.sql -f ISO-8859-1 -t UTF-8 -o db.utf8.sql 3. create the UTF8 database createdb utf8db (// I'm not sure why he's not specifying DB encoding here, maybe better use -E to specify the encoding as UTF8) 4.restore the converted UTF8 database. psql -d utf8db -f db.utf8.sql which method is better? For what I can tell now is the second approach would generate bigger dump file size, so better to pipe it to bzip to have a compressed file. But other than that, any other considerations? Thanks, Suya -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce Sent: Friday, October 18, 2013 11:23 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ? On 10/17/2013 3:13 PM, Huang, Suya wrote: > I've got a question of converting database from ascii to UTF-8, what's > the best approach to do so if the database size is very large? > Detailed procedure or experience sharing are much appreciated! I believe you will need to dump the whole database, and import it into a new database that uses UTF8 encoding. Ss far as I know, there's no way to convert encoding in place. As the other gentlemen pointed out, you also will have to convert/sanitize all text data, as your current SQL_ASCII fields could easily contain stuff that's not valid UTF8. for large databases, this is a major undertaking. I find its often easiest to do a major change like this between the old and a new database server. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
On 10/17/2013 9:49 PM, Huang, Suya wrote: Yes John, we probably will use a new database server here to accommodate those converted database. By saying export/import, do you mean by : 1. pg_dump (//should I specify -E UTF 8 to dump the data in UTF-8 encoding?) 2. create database xxx -E UTF8 3. pg_restore I don't believe 8.3 supported multiple different encodings on the same server instance, thats relatively new. before you can import your SQL_ASCII data, you need to know what charset the data is actually in.Is it UTF8 data stored in SQL_ASCII or is it LATIN1 (ISO-8859) ? or some sort of Big5 or euc_cn ? or what? if it is already UTF8 data, are you sure that there are no invalid encodings accidentally stored? Postgres with SQL_ASCII does no character validation... if its all USASCII (0x00 to 0x7F) then you're OK. I would strongly recommend this new database server be running a currently supported version, I'd probably use 9.2. configure the old server to allow the postgres user on the new server to connect and log on, and while logged onto the new server, run something like... pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -s -f olddatabase.schema.sql pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -a -f olddatabase.data.sql createuser newuser createdb -O newuser -l en_US.utf8 newdbname psql -d newdbname -u newuser -f olddatabase.schema.sql psql -d newdbname -u newuser -f olddatabase.data.sql if the data import fails due to a invalid encoding, then you may have to pass the .data.sql file through iconv (and remove the set client_encoding sql commands from it) -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
Hi John, " Is it UTF8 data stored in SQL_ASCII or is it LATIN1 (ISO-8859) ? or some sort of Big5 or euc_cn ? or what? if it is already UTF8 data, are you sure that there are no invalid encodings accidentally stored? Postgres with SQL_ASCII does no character validation... if its all USASCII (0x00 to 0x7F) then you're OK." Question: How can I pull out the real character set information from the database? does it rely on the understanding of business knowledge? so, the real data stored in database decides if we need to use iconv to convert them to UTF-8. If data is USASCII, then the pg_dump/restore process you provided should be sufficient and complete. Do I understand this correctly? Besides, we do have different encoding database on same server instance, see below: Welcome to psql 8.3.11, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit postgres=# \l List of databases Name | Owner | Encoding -+--+--- admin | dba | UTF8 pgdb_1_sky | pgdb_1 | SQL_ASCII pgdb_1_sky_utf8 | pgdb_1 | UTF8 pgdb_1_ca | pgdb_1 | SQL_ASCII pgdb_1_us | pgdb_1 | SQL_ASCII pgdb_sky | pgdb | SQL_ASCII pgdb_sky_users| pgdb | SQL_ASCII pgdb_sky_users_utf8 | pgdb | UTF8 pgdb_sky_utf8 | pgdb | UTF8 pgdb_sky_utf8_86465_old | pgdb | UTF8 Thanks, Suya -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce Sent: Friday, October 18, 2013 4:12 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ? On 10/17/2013 9:49 PM, Huang, Suya wrote: > Yes John, we probably will use a new database server here to accommodate > those converted database. > > By saying export/import, do you mean by : > 1. pg_dump (//should I specify -E UTF 8 to dump the data in UTF-8 > encoding?) 2. create database xxx -E UTF8 3. pg_restore I don't believe 8.3 supported multiple different encodings on the same server instance, thats relatively new. before you can import your SQL_ASCII data, you need to know what charset the data is actually in.Is it UTF8 data stored in SQL_ASCII or is it LATIN1 (ISO-8859) ? or some sort of Big5 or euc_cn ? or what? if it is already UTF8 data, are you sure that there are no invalid encodings accidentally stored? Postgres with SQL_ASCII does no character validation... if its all USASCII (0x00 to 0x7F) then you're OK. I would strongly recommend this new database server be running a currently supported version, I'd probably use 9.2. configure the old server to allow the postgres user on the new server to connect and log on, and while logged onto the new server, run something like... pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -s -f olddatabase.schema.sql pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -a -f olddatabase.data.sql createuser newuser createdb -O newuser -l en_US.utf8 newdbname psql -d newdbname -u newuser -f olddatabase.schema.sql psql -d newdbname -u newuser -f olddatabase.data.sql if the data import fails due to a invalid encoding, then you may have to pass the .data.sql file through iconv (and remove the set client_encoding sql commands from it) -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?
On 10/17/2013 10:51 PM, Huang, Suya wrote: Question: How can I pull out the real character set information from the database? does it rely on the understanding of business knowledge? what did you store in it? because its declared SQL_ASCII, postgres doesn't know, its all just bytes. you could have stored standard 7 bit USASCII, or you could have stored ISO-8859-1 (-2, -3,...), or you could have stored one of the myriad non-UTF Asian multibyte character codes.postgres doesn't know or care what you put in there, and it doesn't check to ensure its valid. IF all your data is in a consistent encoding, and you specify that encoding on the pg_dump command, then the psql command should be able to restore it as-is to the new UTF8 database via the magic of client_encoding.if the data is not consistent, you'll have a much harder time. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general