Re: [GENERAL] PostgreSQL vs Mongo

2013-10-17 Thread Bill Moran

If they pull out the "Mongo is faster than PostgreSQL" card, I'll state that
I investigated this for my current employer and with roughly equivalent
configurations (because it's impossible to get exactly equivalent) I was
getting roughly the same performance from each.  It's difficult to measure
exactly, but I would recommend that if performance is a reason pulling you
toward Mongo, that you don't listen to the internet hype and actually test
both systems with your workload before assuming one will be faster than the
other.

On Wed, 16 Oct 2013 20:11:43 -0700 Chris Travers  
wrote:

> On Wed, Oct 16, 2013 at 8:30 AM, CS DBA  wrote:
> 
> > All;
> >
> > One of our clients is talking about moving to Mongo for their
> > reporting/data mart.  I suspect the real issue is the architecture of their
> > data mart schema, however I don't want to start pushing back if I can't
> > back it up.
> >
> 
> You want to find out what the issues are before making your pitch.  Get
> specifics.
> 
> >
> > Anyone have any thoughts on why we would / would not use Mongo for a
> > reporting environment.
> >
> > what are the use cases where mongo is a good fit?
> >
> 
> The argument for NoSQL in reporting is where the variety of data makes
> traditional reporting difficult.  This is a niche case, and not a typical
> data mart.
> 
> 
> > what are the drawbacks long term?
> >
> 
> If you use the wrong tool for the job. you are going to find yourself coded
> into corners.  The tradeoff is that if you allow data variety on the way
> in, you can't ensure simple mathematical transformation of that data to be
> meaningful on the way out.  This means that the precision of your answers
> goes down once you eliminate schemas.  Where you don't have to, you should
> not go with a NoSQL solution for reporting.
> 
> After all, reporting really is the forte of SQL and has been for a long
> time.
> 
> 
> > is mongo a persistent db or simply a big memory cache?
> > does mongo have advantages over Postgres hstore?
> >
> 
> I assume Mongo docs can be nested JSON?  Also you have some intraquery
> parallelism at least between nodes.  The latter can be solved with careful
> use of Postgres-XC.  The former would make XML on PostgreSQL a better
> comparison.
> 
> In general these things need to be details-oriented.  It is critically
> important to find out if they are considering it due to hype or whether
> they have real reasons for the comparison.  Maybe in some cases, NoSQL
> options may be better, but these are relatively rare, particularly in
> analytic environments.
> 
> -- 
> Best Wishes,
> Chris Travers
> 
> Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
> lock-in.
> http://www.efficito.com/learn_more.shtml


-- 
Bill Moran 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL vs Mongo

2013-10-17 Thread Serge Fonville
One of the strengths/weaknesses of Mongo are with the similarity between
code and access. This simplifies development from a developer's
perspective, but complicates from an administrator perspective. If you want
an informed opinion, ask the same question on the Mongo mailing list. Also
look into for example Apache Cassandra.

HTH

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl


2013/10/17 Bill Moran 

>
> If they pull out the "Mongo is faster than PostgreSQL" card, I'll state
> that
> I investigated this for my current employer and with roughly equivalent
> configurations (because it's impossible to get exactly equivalent) I was
> getting roughly the same performance from each.  It's difficult to measure
> exactly, but I would recommend that if performance is a reason pulling you
> toward Mongo, that you don't listen to the internet hype and actually test
> both systems with your workload before assuming one will be faster than the
> other.
>
> On Wed, 16 Oct 2013 20:11:43 -0700 Chris Travers 
> wrote:
>
> > On Wed, Oct 16, 2013 at 8:30 AM, CS DBA 
> wrote:
> >
> > > All;
> > >
> > > One of our clients is talking about moving to Mongo for their
> > > reporting/data mart.  I suspect the real issue is the architecture of
> their
> > > data mart schema, however I don't want to start pushing back if I can't
> > > back it up.
> > >
> >
> > You want to find out what the issues are before making your pitch.  Get
> > specifics.
> >
> > >
> > > Anyone have any thoughts on why we would / would not use Mongo for a
> > > reporting environment.
> > >
> > > what are the use cases where mongo is a good fit?
> > >
> >
> > The argument for NoSQL in reporting is where the variety of data makes
> > traditional reporting difficult.  This is a niche case, and not a typical
> > data mart.
> >
> >
> > > what are the drawbacks long term?
> > >
> >
> > If you use the wrong tool for the job. you are going to find yourself
> coded
> > into corners.  The tradeoff is that if you allow data variety on the way
> > in, you can't ensure simple mathematical transformation of that data to
> be
> > meaningful on the way out.  This means that the precision of your answers
> > goes down once you eliminate schemas.  Where you don't have to, you
> should
> > not go with a NoSQL solution for reporting.
> >
> > After all, reporting really is the forte of SQL and has been for a long
> > time.
> >
> >
> > > is mongo a persistent db or simply a big memory cache?
> > > does mongo have advantages over Postgres hstore?
> > >
> >
> > I assume Mongo docs can be nested JSON?  Also you have some intraquery
> > parallelism at least between nodes.  The latter can be solved with
> careful
> > use of Postgres-XC.  The former would make XML on PostgreSQL a better
> > comparison.
> >
> > In general these things need to be details-oriented.  It is critically
> > important to find out if they are considering it due to hype or whether
> > they have real reasons for the comparison.  Maybe in some cases, NoSQL
> > options may be better, but these are relatively rare, particularly in
> > analytic environments.
> >
> > --
> > Best Wishes,
> > Chris Travers
> >
> > Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
> > lock-in.
> > http://www.efficito.com/learn_more.shtml
>
>
> --
> Bill Moran 
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] pg_similarity

2013-10-17 Thread sachin kotwal
I tried the installation as suggested at
http://pgsimilarity.projects.pgfoundry.org/

after make install command we have run following command:-

psql -f SHAREDIR/contrib/pg_similarity.sql mydb

Here SHAREDIR is /usr/local/pgsql/share/extension/ 
under this directory we can see file pg_similarity.sql and mydb is your
database name.
so provide correct path and database then pg_similarity will create it's
classes,functions and operators etc.

after that you can load pg_similarity using SQL command load
'pg_similarity'; or 
copy a sample file at tarball (pg_similarity.conf.sample) to PGDATA (as
pg_similarity.conf) and include the following line in postgresql.conf:-
include 'pg_similarity.conf'




-
Thanks and Regards,

Sachin Kotwal
NTT-DATA-OSS Center (Pune)
--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-similarity-tp5774125p5774835.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] C-language stored function and float4 type

2013-10-17 Thread sachin kotwal
How you checked result type?
Can you explain in details?



-
Thanks and Regards,

Sachin Kotwal
NTT-DATA-OSS Center (Pune)
--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/C-language-stored-function-and-float4-type-tp5773493p5774840.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Index creation fails with automatic names

2013-10-17 Thread Florian Nigsch

Hi all,

I am not sure if this is a bug or a misuse on my part.

I am creating a number of indices in parallel on a table by using  
xargs. To do that, I write all my indices in a file indices.idx, and  
then have the indices build in parallel (in this case with 5  
concurrent processes)


cat indices.idx | xargs -P5 -I# psql -1 -c '#'

indices.idx contains lines like this:

ALTER TABLE schema.table1 ADD CONSTRAINT pk_activity PRIMARY KEY (field_sk);

CREATE INDEX ON schema.table1 ((LOWER(field2)));
CREATE INDEX ON schema.table1 ((LOWER(field3)));
CREATE INDEX ON schema.table1 (field4, field5);
CREATE INDEX ON schema.table1 (field4, field6, field5);


Upon running the above command, I see the following error:

ALTER TABLE
CREATE INDEX
ERROR:  duplicate key value violates unique constraint  
"pg_class_relname_nsp_index"
DETAIL:  Key (relname, relnamespace)=(table1_lower_idx, 2064404)  
already exists.


My question is then - where does this error come from? Is is because  
Postgres allocates the same name (table1_lower_idx) twice when the  
index begins building, because at that time there's no index present  
with that name? But if one index finishes earlier, then the second one  
can't be committed because it has the same name as an already present  
index?


Any clarifications would be greatly appreciated!

Best,

Flo

P.S. Please CC me, I am not currently subscribed.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Idle transactions in PostgreSQL 9.2.4

2013-10-17 Thread Svetlin Manavski
Hi all,

I have some idle transactions in PostgreSQL 9.2.4 server which never end.
My application was working fine on version 9.1 (BSD) but the problem
appeared immediately as we ported it to 9.2.4 on Linux. The idle operations
would frequently appear as COMMITs but sometimes I observed INSERTS as well.
Please note that each commit below refers to *very light* inserts of single
rows which are supposed to last milliseconds. However some of the just
never end even if they are not waiting on other operations. See below:

SELECT datname,usename,pid,client_addr,waiting,query_start,query FROM
pg_stat_activity;

datname  |  usename  |  pid  | client_addr | waiting |  query_start
 |query

--+---+---+-+-+---
 AppQoSDB | appqosusr | 17015 | 127.0.0.1   | f   | 2013-10-15
10:21:38.501435+01 | select AppQoSConfig.get_ip_currentnode()
 AppQoSDB | appqosusr | 17016 | 127.0.0.1   | f   | 2013-10-15
10:21:38.502346+01 | COMMIT
 AppQoSDB | appqosusr | 17017 | 127.0.0.1   | f   | 2013-10-15
10:21:38.584794+01 | select AppQoSConfig.get_ip_currentnode()
 AppQoSDB | appqosusr | 17018 | 127.0.0.1   | f   | 2013-10-15
10:21:38.586073+01 | COMMIT
 AppQoSDB | appqosusr | 17019 | 127.0.0.1   | f   | 2013-10-15
09:20:32.724517+01 | select AppQoSConfig.get_ip_currentnode()
 AppQoSDB | appqosusr | 17021 | 127.0.0.1   | f   | 2013-10-15
10:21:38.622651+01 | select AppQoSConfig.get_ip_currentnode()
 AppQoSDB | appqosusr | 17020 | 127.0.0.1   | f   | 2013-10-15
09:20:32.724433+01 | select AppQoSConfig.get_ip_currentnode()
 AppQoSDB | appqosusr | 17022 | 127.0.0.1   | f   | 2013-10-15
10:21:37.368831+01 | select AppQoSConfig.get_ip_currentnode()
 AppQoSDB | appqosusr | 17024 | 127.0.0.1   | f   | 2013-10-15
09:20:32.828307+01 | COMMIT
 AppQoSDB | appqosusr | 17026 | 127.0.0.1   | f   | 2013-10-15
10:21:38.624378+01 | COMMIT
 AppQoSDB | appqosusr | 17023 | 127.0.0.1   | f   | 2013-10-15
09:20:32.828302+01 | COMMIT
 AppQoSDB | appqosusr | 17025 | 127.0.0.1   | f   | 2013-10-15
10:21:37.369869+01 | COMMIT
 AppQoSDB | appqosusr | 17027 | 127.0.0.1   | f   | 2013-10-15
10:21:38.633244+01 | SELECT
datname,usename,pid,client_addr,waiting,query_start,query FROM
pg_stat_activity;


This is the current configuration:

 name |
 current_setting
--+
 version  | PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1)
4.7.2, 64-bit
 autovacuum   | on
 autovacuum_analyze_threshold | 50
 autovacuum_max_workers   | 1
 autovacuum_naptime   | 1h
 autovacuum_vacuum_threshold  | 50
 checkpoint_completion_target | 0.7
 checkpoint_segments  | 128
 checkpoint_warning   | 30s
 client_encoding  | UTF8
 effective_cache_size | 16GB
 lc_collate   | en_GB.UTF-8
 lc_ctype | en_GB.UTF-8
 listen_addresses | *
 log_destination  | syslog, stderr
 log_min_duration_statement   | 2min
 log_rotation_age | 10d
 log_rotation_size| 100MB
 logging_collector| on
 max_connections  | 200
 max_stack_depth  | 2MB
 server_encoding  | UTF8
 shared_buffers   | 6GB
 synchronous_commit   | off
 TimeZone | GB
 wal_buffers  | 128kB
 work_mem | 18MB


Thank you,
Svetlin Manavski


[GENERAL] Remove or alter the default access privileges of the public schema by the database owner

2013-10-17 Thread Christian Affolter

Hi everyone

I'm looking for a way to let a role which created a new database (is the 
database owner) change (remove) the default access privileges of the 
public schema, which allows everyone to use and create objects within 
this schema. I do not want to give the role the SUPERUSER option.


  List of schemas
  Name  |  Owner   |  Access privileges   |  Description
+--+--+
 public | postgres | postgres=UC/postgres+| standard public schema
|  | =UC/postgres |



Basically, I have an administrative role () with CREATEROLE, 
CREATEDB and NOSUPERUSER options set.


This role needs to be able to to the following:

CREATE DATABASE "" OWNER "" ENCODING...;

REVOKE ALL ON DATABASE "" FROM PUBLIC;
GRANT CONNECT, TEMPORARY ON DATABASE "" TO "";

GRANT ALL ON SCHEMA public TO "";
REVOKE ALL ON SCHEMA public FROM PUBLIC;
GRANT USAGE ON SCHEMA public TO "";
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, 
DELETE ON TABLES TO "";


All the grants/revoks on the public schema fail because the role is not 
the owner of the public schema.


To circumvent this I've tried the following:

Created a new template database (as a superuser) which the owner of the 
public schema set to the  and use this template for all 
database creations. This solves the grant/revoke problem on the public 
schema, but the role is unable to create databases with different 
collation settings (new collation (...) is incompatible with the 
collation of the template database). And there are a lot of different 
collation settings needed.


Dropping the public schema beforehand on the template is also no option 
as I have to use template0.



Is there a way to either let the owner of a database own the public 
schema by default, or to ignore the collation settings on the template 
database (it will never have any data preloaded, only the ownership of 
the public schema changed)? Or maybe there is a complete other approach 
to solve this problem.



Many thanks and best regards
Christian



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Help function to sort string

2013-10-17 Thread ginkgo36
Hi everyone,
Please hepl me this function:

1. I want to sort string follow anphabet and I used this query:
select string_agg(x, ';') from (select
trim(unnest(regexp_split_to_array('ECD FORM; BODY; PREDILUTED; CHROMO-GENIC;
AUTO;RABBIT; FORMAT',';'))) x order by x) a;

-- result: AUTO; BODY; CHROMOGENIC; ECD FORM;  FORMAT; PREDILUTED;  RABBIT
-->I expected this rusult

In my database I have a column with alot of rows data. I want that query
become a function to more easy to using. But I can not write a function :(.
please hepl me.
For example, I have column "data_text" with data like this:
Row 1: AUTO; BODY; PREDILUTED; ECD FORM; RABBIT; FORMAT; CHROMOGENIC
Row 2: ECD FORM; BODY; PREDILUTED; CHROMO-GENIC; AUTO; RABBIT; FORMAT
Row 3: FORMAT; ECD FORM; AUTO
Row 3: ANHYDROUS; DENATURED; PREDILUTED; CHROMOGENIC

When I run funtion, the result:
Row 1: AUTO; BODY; CHROMOGENIC; ECD.FORM; FORMAT; PREDILUTED; RABBIT
Row 2: AUTO; BODY; CHROMO-GENIC; ECD FORM; FORMAT; PREDILUTED; RABBIT
Row 3: AUTO; ECD FORM; FORMAT
Row 4: ANHYDROUS; CHROMOGENIC; DENATURED; PREDILUTED

Thank you and best regards,



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Help-function-to-sort-string-tp5774638.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] C-language stored function and float4 type

2013-10-17 Thread whiplash

Sorry, I created an overloaded function and forgot about it.

whiplash  writes:

Hello! I have C-language stored function like this:
CREATE FUNCTION testfunc ( x real, y real )
RETURNS real AS
'testfunc'
LANGUAGE C STRICT;
If i use this function (SELECT testfunc ( 0.1, 0.2 )) I get result with
type "double precision".

What's your basis for asserting that?  The function is clearly defined
to return float4.

regards, tom lane





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Missing record in binary replica 9.3.0

2013-10-17 Thread Kaveh Mousavi Zamani
All,

I found something in the following setup:

I have 3 DBS

db01 (9.3.0) master
db02 (9.3.0) replica same network
db03 (9.3.0) sync replica another network 80ms away.

db01 have around 30 - 300 records per second update.

At some point today I found there is a missing record in both replica.
Data after that point has been replicated though

db01
select * from transaction.account where client_detail_id ='163042';
  id| client_detail_id |   last_modified
-++---+--+
1736861 | 163042   | 2013-10-17 10:39:18.312815


in both other dbs no record.

Does anyone has any idea how could this happen?

then in db01 I did update last_modified field for that record and suddenly
new record and new last_modified values appeared in both replicas.

Regards


Re: [GENERAL] Index creation fails with automatic names

2013-10-17 Thread Vick Khera
On Mon, Oct 14, 2013 at 6:31 AM, Florian Nigsch  wrote:

> My question is then - where does this error come from? Is is because
> Postgres allocates the same name (table1_lower_idx) twice when the index
> begins building, because at that time there's no index present with that
> name? But if one index finishes earlier, then the second one can't be
> committed because it has the same name as an already present index?
>
>
looks like the auto-generated names for your indexes clash. Give them
explicit names.

Also, subscribe to the list if you want help. this is not your private help
desk.


Re: [GENERAL] Index creation fails with automatic names

2013-10-17 Thread Ian Lawrence Barwick
2013/10/14 Florian Nigsch :
> Hi all,
>
> I am not sure if this is a bug or a misuse on my part.
>
> I am creating a number of indices in parallel on a table by using xargs. To
> do that, I write all my indices in a file indices.idx, and then have the
> indices build in parallel (in this case with 5 concurrent processes)
>
> cat indices.idx | xargs -P5 -I# psql -1 -c '#'
>
> indices.idx contains lines like this:
>
> ALTER TABLE schema.table1 ADD CONSTRAINT pk_activity PRIMARY KEY (field_sk);
>
> CREATE INDEX ON schema.table1 ((LOWER(field2)));
> CREATE INDEX ON schema.table1 ((LOWER(field3)));
> CREATE INDEX ON schema.table1 (field4, field5);
> CREATE INDEX ON schema.table1 (field4, field6, field5);
>
>
> Upon running the above command, I see the following error:
>
> ALTER TABLE
> CREATE INDEX
> ERROR:  duplicate key value violates unique constraint
> "pg_class_relname_nsp_index"
> DETAIL:  Key (relname, relnamespace)=(table1_lower_idx, 2064404) already
> exists.
>
> My question is then - where does this error come from? Is is because
> Postgres allocates the same name (table1_lower_idx) twice when the index
> begins building, because at that time there's no index present with that
> name? But if one index finishes earlier, then the second one can't be
> committed because it has the same name as an already present index?

It works fine for me on Pg 9.3.1:

postgres=# CREATE TABLE foo(val1 text, val2 text);
CREATE TABLE
postgres=# CREATE INDEX on foo((lower(val1)));
CREATE INDEX
postgres=# CREATE INDEX on foo((lower(val2)));
CREATE INDEX
postgres=# \d foo
   Table "public.foo"
Column | Type | Modifiers
+--+---
val1   | text |
val2   | text |
Indexes:
   "foo_lower_idx" btree (lower(val1))
   "foo_lower_idx1" btree (lower(val2))

Which PostgreSQL version are you using? Are you sure there's not an index
with the offending name already?

Regards

Ian Barwick


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Missing record in binary replica 9.3.0

2013-10-17 Thread Kaveh Mousavi Zamani
All,

I found something in the following setup:

I have 3 DBS

db01 (9.3.0) master
db02 (9.3.0) replica same network
db03 (9.3.0) sync replica another network 80ms away.

db01 have around 30 - 300 records per second update.

At some point today I found there is a missing record in both replica.
Data after that point has been replicated though

db01
select * from transaction.account where client_detail_id ='163042';
  id| client_detail_id |   last_modified
-++---+--+
1736861 | 163042   | 2013-10-17 10:39:18.312815


in both other dbs no record.

Does anyone has any idea how could this happen?

then in db01 I did update last_modified field for that record and suddenly
new record and new last_modified values appeared in both replicas.

Regards


Re: [GENERAL] Remove or alter the default access privileges of the public schema by the database owner

2013-10-17 Thread Christian Affolter

Hello

Please accept my apologies for the double posting. The original mail was 
held off by the Majordomo mailing list software until a mailing list 
administrator would allow it to be delivered. Majordomo doesn't like the 
string 'remove' within the subject.
Thereupon, I informed the mailing list admins that I will resend the 
message with a new subject and that they can safely delete the original 
message. Apparently, my notice got overlooked somehow.



Sorry again.
Christian


On 15.10.2013 12:28, Christian Affolter wrote:

Hi everyone

I'm looking for a way to let a role which created a new database (is the
database owner) change (remove) the default access privileges of the
public schema, which allows everyone to use and create objects within
this schema. I do not want to give the role the SUPERUSER option.


[...]


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index creation fails with automatic names

2013-10-17 Thread Merlin Moncure
On Mon, Oct 14, 2013 at 5:31 AM, Florian Nigsch  wrote:
> Hi all,
>
> I am not sure if this is a bug or a misuse on my part.
>
> I am creating a number of indices in parallel on a table by using xargs. To
> do that, I write all my indices in a file indices.idx, and then have the
> indices build in parallel (in this case with 5 concurrent processes)
>
> cat indices.idx | xargs -P5 -I# psql -1 -c '#'
>
> indices.idx contains lines like this:
>
> ALTER TABLE schema.table1 ADD CONSTRAINT pk_activity PRIMARY KEY (field_sk);
>
> CREATE INDEX ON schema.table1 ((LOWER(field2)));
> CREATE INDEX ON schema.table1 ((LOWER(field3)));
> CREATE INDEX ON schema.table1 (field4, field5);
> CREATE INDEX ON schema.table1 (field4, field6, field5);
>
>
> Upon running the above command, I see the following error:
>
> ALTER TABLE
> CREATE INDEX
> ERROR:  duplicate key value violates unique constraint
> "pg_class_relname_nsp_index"
> DETAIL:  Key (relname, relnamespace)=(table1_lower_idx, 2064404) already
> exists.
>
> My question is then - where does this error come from? Is is because
> Postgres allocates the same name (table1_lower_idx) twice when the index
> begins building, because at that time there's no index present with that
> name? But if one index finishes earlier, then the second one can't be
> committed because it has the same name as an already present index?
>
> Any clarifications would be greatly appreciated!

hm. what happens when you set transaction isolation to serializable?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] COPY table FROM STDIN doesn't show count tag

2013-10-17 Thread Tim Kane

According to the docs, the output of a copy command should return a COPY
count.  It turns out this only happens when copying from a file, and not
from STDIN.

> Outputs
> On successful completion, a COPY command returns a command tag of the form
> COPY count
> The count is the number of rows copied.
> 
clone_test_rename=# create temp table aaa (field varchar(255));
CREATE TABLE
Time: 3.806 ms
clone_test_rename=# copy aaa from '/tmp/test.dat';
COPY 3
Time: 0.623 ms
clone_test_rename=# copy aaa from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 123
>> 456
>> hello
>> \.
Time: 8472.305 ms


Tested on 9.3.0
Is there another way I can determine the number of rows copied, short of
performing a pre and post count?   (other than not using stdin, that is).






Re: [GENERAL] Idle transactions in PostgreSQL 9.2.4

2013-10-17 Thread Victor Yegorov
2013/10/15 Svetlin Manavski 

> I have some idle transactions in PostgreSQL 9.2.4 server which never end.
> My application was working fine on version 9.1 (BSD) but the problem
> appeared immediately as we ported it to 9.2.4 on Linux. The idle operations
> would frequently appear as COMMITs but sometimes I observed INSERTS as well.
> Please note that each commit below refers to *very light* inserts of
> single rows which are supposed to last milliseconds. However some of the
> just never end even if they are not waiting on other operations. See below:
>
> SELECT datname,usename,pid,client_addr,waiting,query_start,query FROM
> pg_stat_activity;
>

In 9.2 there's an extra column in the view that you're missing in your
query: state.

If state='active', then `query` shows _currently running_ query.
Otherwise it shows _last query_ executed by the session.

Check here:
http://www.postgresql.org/docs/9.2/static/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW


-- 
Victor Y. Yegorov


[GENERAL] Strange results with pg_restore

2013-10-17 Thread Oscar Calderon
Have a good day. This friday i'm going to migrate an entire database of a
government institution in my country. Those are like 4 database of 2GB each
one. So, i was preparing about what i'm going to do tomorrow. They
currently have PostgreSQL 9.1 installed from source i think, and they will
update RHEL version to 5.7, it will be a reinstall of operative system, so
before that i will backup the databases, copy contents of pg_hba.conf and
so on, and after the reinstallation of RHEL i'm going to install PostgreSQL
9.3 from official repositories alongside contrib and plpython.

I've already tested that on a Centos 5.7 VM and it works fine. But well,
yesterday i was practicing creation of backups and restoring backups in my
laptop with the next specs:

Core i7 2.2 Ghz
8GB RAM
30GB empty space
Windows 8.1

In my laptop i also have PostgreSQL 9.3. First i used the sample database
pgdellstore, what i did is that i created a database, and ran the SQL file
of the database. After that, i created a backup of that database
pgdellstore with pg_dump, a custom format, without compression:

pg_dump -h localhost -p 5432 -U postgres -W -Fc -Z0 -C -d pgdellstore >
pgdellstore.backup

And everything as normal. After that i created a second database and
restored my custom backup with pg_restore trying to speed up using -j
option:

pg_restore -h localhost -p 5432 -U postgres -W -d pgdellstore -j4
pgdellstore.backup

Also, before that, i backed up my postgresql.conf and edited the current
one with the next parameters, according to some research that i did in
google about performance tuning of pg_restore:

shared_buffers=2GB
maintenance_work_mem=1GB
work_mem=128MB
wal_buffers=16MB
checkpoint_segments=8
autovacuum=off
archive_mode=off
fsync=off
full_page_writes=off
checkpoint_timeout=15min
checkpoint_completion_target=0.9
track_counts=off
synchronous_commit=off
bgwriter_delay=50ms

And it restored it in 2 seconds i think, for me it was extremely fast (is a
database of 25MB aproximately) and i had doubts, so i have some queries to
check database tables size and i compared the results of the first database
(the one that i created with original sql script) and the second one (the
one that i created using a custom backup of the first one) and i got this:

DB created with original SQL script:

 schemaname |  tablename   | reltuples |  tamanio   | tamanioord
+--+---++
 public | customers| 2 | 5016 kB|5136384
 public | orderlines   | 60350 | 4440 kB|4546560
 public | cust_hist| 60350 | 3976 kB|4071424
 public | products | 1 | 1552 kB|1589248
 public | orders   | 12000 | 1384 kB|1417216
 public | inventory| 1 | 704 kB | 720896
 public | categories   |16 | 24 kB  |  24576
 public | afiliado | 4 | 24 kB  |  24576
 public | pruebafechas | 0 | 8192 bytes |   8192
 public | reorder  | 0 | 0 bytes|  0
(10 filas)

DB created with custom backup based on first db

 schemaname |  tablename   | reltuples |  tamanio   | tamanioord
+--+---++
 public | customers| 2 | 4992 kB|5111808
 public | orderlines   | 60350 | 4416 kB|4521984
 public | cust_hist| 60350 | 3952 kB|4046848
 public | products | 1 | 1528 kB|1564672
 public | orders   | 12000 | 1360 kB|1392640
 public | inventory| 1 | 680 kB | 696320
 public | afiliado | 4 | 24 kB  |  24576
 public | categories   |16 | 24 kB  |  24576
 public | pruebafechas | 0 | 8192 bytes |   8192
 public | reorder  | 0 | 0 bytes|  0
(10 filas)

This is the query that i used to get those results:

SELECT tbl.schemaname, tbl.tablename, obj.reltuples
, pg_size_pretty(pg_total_relation_size(text('"' || tbl.schemaname || '"."'
|| tbl.tablename || '"'))) tamanio, pg_total_relation_size(text('"' ||
tbl.schemaname || '"."' || tbl.tablename || '"')) tamanioord
FROM pg_tables tbl, pg_class obj WHERE tbl.tablename = obj.relname
AND tbl.schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY
tamanioord DESC;

As you can see, there are certain variations in size, the number of tuples
remains, but the fields tamanio (formatted size) and tamanioord (size
without format) have a difference, for example the table customers has 5016
kB versus 4992 kB . I'm afraid that in someway i lost data or something
like that, or according to your experience that can be normal and doesn't
mean that information is corrupted or something, because tomorrow when i
will do that with productions databases it will be a chaos if information
gets corrupted.

Also, my second concern is the next: 4 months ago, i did a 

[GENERAL] day_trunc and day duration in a remote time zone

2013-10-17 Thread Marc Mamin
Hello,
I want to retrieve the day start and duration of an epoch  within a given time 
zone
and return the day start as epoch.

the queries below works as expected, but I would  appreciate a simpler solution 
...

example:

( http://localtimes.info/Europe/Cyprus/Nicosia/ )

select (TIMESTAMP WITH TIME ZONE 'epoch' + interval '1382886336 
seconds') AT TIME ZONE 'Europe/Nicosia'
2013-10-27 17:05:36



SELECT
EXTRACT (epoch FROM cast(date( (TIMESTAMP WITH TIME ZONE 'epoch' + interval 
'1382886336 seconds' )
AT TIME ZONE 'Europe/Nicosia') || ' 00:00 Europe/Nicosia' as timestamptz)) as 
day_start

=> 1382821200


SELECT
(
EXTRACT (epoch FROM cast(date( (TIMESTAMP WITH TIME ZONE 'epoch' + 
interval '1382983536 seconds' /* added 27 hours to the previous day_start 
result */ )
AT TIME ZONE 'Europe/Nicosia') || ' 00:00 Europe/Nicosia' as 
timestamptz))
-
1382821200 /* = day_start*/
)/3600 as hour_duration

=> 25 hours, which is correct as the daylight saving time ends at this date



regards,

Marc Mamin



Re: [GENERAL] day_trunc and day duration in a remote time zone

2013-10-17 Thread Adrian Klaver

On 10/17/2013 08:47 AM, Marc Mamin wrote:

Hello,
I want to retrieve the day start and duration of an epoch  within a
given time zone
and return the day start as epoch.



Not quite sure what you are looking for, but something like:

hplc=> select current_date::timestamp, extract(epoch from 
current_date::timestamp);


  timestamp  | date_part
-+
 2013-10-17 00:00:00 | 1381968000



regards,
Marc Mamin



--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL vs Mongo

2013-10-17 Thread David Kerr
On Wed, Oct 16, 2013 at 09:30:59AM -0600, CS DBA wrote:
- All;
- 
- One of our clients is talking about moving to Mongo for their 
- reporting/data mart.  I suspect the real issue is the architecture of 
- their data mart schema, however I don't want to start pushing back if I 
- can't back it up.
- 
- Anyone have any thoughts on why we would / would not use Mongo for a 
- reporting environment.
- 
- what are the use cases where mongo is a good fit?
- what are the drawbacks long term?
- is mongo a persistent db or simply a big memory cache?
- does mongo have advantages over Postgres hstore?
- etc...
- 
- Thanks in advance...
- 
- /Kevin

I work with both. 

Mongo doesn't really seem approprite for a datamart. Mongo supports 
Map Reduce and has an Aggregation framework (which will give you a lot
of the functionality of SQL but is much more esoteric) 

You need an index for every query you run and every possibly sort order.
Mongo will cancel you're query if the result set hits a certian size
w/o an index.

Doing ad-hoc queries is HARD. and there are no joins. If it's not in
your document you basically have to pull both documents into your app
and join them by hand.

Writes block reads, massive updates (like into a datamart) will need to "yield"
to allow reads to happen and that only happens at a pagefault.

You need to have enough memory to store you're "working set". or performance 
tanks
In a datamart your working set is frequently the whole thing.

People throw around the "Schemaless" thing, but really there is some schema. you
have to know what you want your document to look like. So this means schema 
changes
as you grow your product, etc. 

In a datamart you're not going to use 10gen's idea schema change methodology 
of "Only Apply Data Model Changes when you access a record" That works if 
you're 
ooking up a single document at a time, but not if you're mostly doing range 
scans 
and aggregations.

Mongo is very limited on how it can sort, we have a number of "sort fields" 
added
to our document that give us a different indexable sort order. like you can't
do ORDER BY CASE statements.

IMO Mongo, like most NoSQL solutons, address write scaling and availablity 
by making it easier to do. You can generally shard w/o bothering the application
too much and you get free seamless failover with the replica sets.


Hope this is helpful

Dave


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index creation fails with automatic names

2013-10-17 Thread Kevin Grittner
Merlin Moncure  wrote:

> On Mon, Oct 14, 2013 at 5:31 AM, Florian Nigsch  wrote:

>> I am creating a number of indices in parallel on a table by using xargs. To
>> do that, I write all my indices in a file indices.idx, and then have the
>> indices build in parallel (in this case with 5 concurrent processes)
>>
>> cat indices.idx | xargs -P5 -I# psql -1 -c '#'
>>
>> indices.idx contains lines like this:
>>
>> ALTER TABLE schema.table1 ADD CONSTRAINT pk_activity PRIMARY KEY
> (field_sk);
>>
>> CREATE INDEX ON schema.table1 ((LOWER(field2)));
>> CREATE INDEX ON schema.table1 ((LOWER(field3)));
>> CREATE INDEX ON schema.table1 (field4, field5);
>> CREATE INDEX ON schema.table1 (field4, field6, field5);
>>
>>
>> Upon running the above command, I see the following error:
>>
>> ALTER TABLE
>> CREATE INDEX
>> ERROR:  duplicate key value violates unique constraint
>> "pg_class_relname_nsp_index"
>> DETAIL:  Key (relname, relnamespace)=(table1_lower_idx, 2064404) already
>> exists.
>>
>> My question is then - where does this error come from? Is is because
>> Postgres allocates the same name (table1_lower_idx) twice when the index
>> begins building, because at that time there's no index present with that
>> name? But if one index finishes earlier, then the second one can't be
>> committed because it has the same name as an already present index?

I'm going to go along with the suggestion that you explicitly name
them when you create the indices.idx file.  When these all start
together, they probably cannot see each others' catalog entries,
and so don't think they are choosing duplicate names.

> hm. what happens when you set transaction isolation to
> serializable?

I would not expect that to help; since system tables weren't using
MVCC snapshots when SSI was implemented, they were excluded from
serializable behavior.  It might be worth revisiting that now that
we have MVCC catalog access, but in this case it would just replace
one type of error with another.
 
-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index creation fails with automatic names

2013-10-17 Thread Kevin Grittner
Ian Lawrence Barwick  wrote:

> It works fine for me on Pg 9.3.1:
>
> postgres=# CREATE TABLE foo(val1 text, val2 text);
> CREATE TABLE
> postgres=# CREATE INDEX on foo((lower(val1)));
> CREATE INDEX
> postgres=# CREATE INDEX on foo((lower(val2)));
> CREATE INDEX

You seem to be creating the indexes one at a time, all on the same
connection.  The OP's problem occurs when starting five CREATE
INDEX statements in five different sessions all at the same time.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] when do I analyze after concurrent index creation?

2013-10-17 Thread AI Rumman
Hi,

I have a very basic question.
If I create index concurrently, then do I need to analyze the table? If
yes, when?
Please let me know.

Thanks.


Re: [GENERAL] Need some help on Performance 9.0.4

2013-10-17 Thread Tomas Vondra
On 17.10.2013 20:56, akp geek wrote:
> got the output from depesz and this the top on the system.   thanks for
> the help ..

>From depesz? That site works differently - you enter the explain plan
into the form, it does some formatting/highlighting and you're supposed
to submit the link to that page. E.g. http://explain.depesz.com/s/JwTB
is for your query.

However it seems you've done some anonymization on the query, e.g.
replacing the table/index/function names and string literals with some
random words.

Anyway, the plan shows the estimates are reasonably accurate, but the
index scan on romeo_four consumes ~99% of the query duration. Not sure
if it can be improved, because we don't know the original query and the
anonymization made it rather annoying to deal with.

> 
> Regards
> 
> load averages:  30.3,  28.8,  27.8;up
> 680+00:51:09  
> 
> 18:24:44
> 156 processes: 123 sleeping, 33 on cpu
> CPU states: 49.5% idle, 50.0% user,  0.4% kernel,  0.0% iowait,  0.0% swap
> Memory: 64G phys mem, 19G free mem, 63G total swap, 63G free swap
> 
>PID USERNAME LWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
>   7965 postgres   1  200 3178M 3171M cpu/7   78:21  1.58% postgres
>  20638 postgres   1  200 3181M 3175M cpu/28 187:01  1.58% postgres
>  22819 postgres   1  300 3181M 3175M cpu/33 264:06  1.57% postgres
>   3789 postgres   1  200 3183M 3176M cpu/9  626:11  1.57% postgres
>  10375 postgres   1  300 3182M 3175M cpu/50 293:55  1.57% postgres
>  27335 postgres   1  300 3175M 3169M cpu/29  27:27  1.57% postgres
>   7959 postgres   1  300 3178M 3171M cpu/41 104:07  1.57% postgres
>   8156 postgres   1  300 3180M 3173M cpu/43 124:18  1.56% postgres
>  20640 postgres   1   00 3182M 3176M cpu/58 207:49  1.56% postgres
>  10584 postgres   1  300 3179M 3173M cpu/35  76:32  1.56% postgres
>  13984 postgres   1  300 3181M 3174M cpu/30 207:04  1.56% postgres
>   3575 postgres   1  200 3283M 3261M cpu/19   1:48  1.56% postgres
>   7964 postgres   1  400 3177M 3170M cpu/62  82:56  1.56% postgres
>   1299 postgres   1  400 3166M 3161M cpu/52   5:11  1.56% postgres
>  27692 postgres   1   00 3181M 3174M cpu/46 260:58  1.56% postgres


Once again, this top output is incomplete - all it lists are postgres
processes (and it's unclear what they're doing), but there are certainly
some other processes running. Moreover, those 15 processes account for
only ~25% of CPU, but there's almost 50% CPU utilized. So where's the rest?

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How do I create a box from fields in a table?

2013-10-17 Thread Rob Richardson
I need to determine whether a given pair of coordinates is inside a given 
rectangle.  According to the documentation, PostgreSQL provides the box and 
point types and a "contains" operator that will be perfect for this.  However, 
the example provided in the documentation only shows the creation of boxes and 
points from constant values.  So, the following query works:

select tran_car_identification, box '((0, 1), (2, 3))' from transfer_car

But if I try to use a field from the transfer_car table, it doesn't work:

select tran_car_identification, box '((location, 1), (2, 3))' from transfer_car

That gives me an "invalid input syntax for type box" error.

How do I create a box object using data from the transfer_car table?

Thanks very much!

RobR


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How do I create a box from fields in a table?

2013-10-17 Thread Merlin Moncure
On Thu, Oct 17, 2013 at 4:04 PM, Rob Richardson
 wrote:
> I need to determine whether a given pair of coordinates is inside a given 
> rectangle.  According to the documentation, PostgreSQL provides the box and 
> point types and a "contains" operator that will be perfect for this.  
> However, the example provided in the documentation only shows the creation of 
> boxes and points from constant values.  So, the following query works:
>
> select tran_car_identification, box '((0, 1), (2, 3))' from transfer_car
>
> But if I try to use a field from the transfer_car table, it doesn't work:
>
> select tran_car_identification, box '((location, 1), (2, 3))' from 
> transfer_car
>
> That gives me an "invalid input syntax for type box" error.
>
> How do I create a box object using data from the transfer_car table?

you have to construct the string.  this is somewhat baroque by modern
postgres standards but should work:

select tran_car_identification, format('((%s, 1), (2, 3))',
location)::box from transfer_car;

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread Huang, Suya
Hi,

I've got a question of converting database from ascii to UTF-8, what's the best 
approach to do so if the database size is very large? Detailed procedure or 
experience sharing are much appreciated!


Thanks,
Suya



Re: [GENERAL] werid error"index row size 3040 exceeds btree maximum, 2712" occur randomly

2013-10-17 Thread Huang, Suya
Thanks Tomas!

I'll spend some time on the link you sent, new learner of Postgres :-)

Thanks,
Suya

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Tomas Vondra
Sent: Wednesday, October 16, 2013 9:06 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] werid error"index row size 3040 exceeds btree maximum, 
2712" occur randomly

On 15.10.2013 03:44, Huang, Suya wrote:
> Thanks Tomas!
> 
> However, in the example I sent,  I already did a vacuum full right 
> after deleted the rows causing problem, before created the index and 
> got an error even the table is vacuumed. Note, the table is I 
> temporarily created using create table as select *... so no other 
> people is accessing that table, except me for the testing purpose.
> 
> Any ideas? And today, while I did the same thing, I can create index 
> on the table right after I deleted the rows causing problem, without 
> vacuum.
> 
> Anything I missed here?

Not sure. The only thing I can think of is another transaction preventing the 
autovacuum from removing the rows, but I can't reproduce it. What PostgreSQL 
version are you using?

BTW take a look at this contrib module:

   http://www.postgresql.org/docs/9.1/static/pageinspect.html

It allows looking directly on items on a data page like this:

   select * from heap_page_items(get_raw_page('test', 0));

Maybe that'll help you identify the problem.

Tomas


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index creation fails with automatic names

2013-10-17 Thread Ian Lawrence Barwick
2013/10/18 Kevin Grittner :
> Ian Lawrence Barwick  wrote:
>
>> It works fine for me on Pg 9.3.1:
>>
>> postgres=# CREATE TABLE foo(val1 text, val2 text);
>> CREATE TABLE
>> postgres=# CREATE INDEX on foo((lower(val1)));
>> CREATE INDEX
>> postgres=# CREATE INDEX on foo((lower(val2)));
>> CREATE INDEX
>
> You seem to be creating the indexes one at a time, all on the same
> connection.  The OP's problem occurs when starting five CREATE
> INDEX statements in five different sessions all at the same time.

(reads original email again) ah yes, brain was not properly engaged.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread Steve Atkins

On Oct 17, 2013, at 3:13 PM, "Huang, Suya"  wrote:

> Hi,
>  
> I’ve got a question of converting database from ascii to UTF-8, what’s the 
> best approach to do so if the database size is very large? Detailed procedure 
> or experience sharing are much appreciated!
>  

The answer to that depends on what you mean by "ascii".

If your current database uses SQL_ASCII encoding - that's not ascii. It could 
have anything in there, including any mix of encodings and there's been no 
enforcement of any encoding, so there's no way of knowing what they are. If 
you've had, for example, webapps that let people paste word documents into 
them, you potentially have different encodings used in different rows of the 
same table.

If your current data is like that then you're probably looking at doing some 
(manual) data cleanup to work out what encoding your data is really in, and 
converting it to something consistent rather than a simple migration from ascii 
to utf8.

Cheers,
  Steve



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread John R Pierce

On 10/17/2013 3:13 PM, Huang, Suya wrote:
I’ve got a question of converting database from ascii to UTF-8, what’s 
the best approach to do so if the database size is very large? 
Detailed procedure or experience sharing are much appreciated!



I believe you will need to dump the whole database, and import it into a 
new database that uses UTF8 encoding. Ss far as I know, there's no way 
to convert encoding in place. As the other gentlemen pointed out, you 
also will have to convert/sanitize all text data, as your current 
SQL_ASCII fields could easily contain stuff that's not valid UTF8.


for large databases, this is a major undertaking. I find its often 
easiest to do a major change like this between the old and a new 
database server.


--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread Huang, Suya
Thanks Steve,

Yes, we're using SQL_ASCII.

Would you please be more specific about manual data cleanup work here? I'm new 
to Postgres and don't have any experience in character set  conversion before, 
so any specific experience shared would be very much appreciated.

Thanks,
Suya

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Steve Atkins
Sent: Friday, October 18, 2013 11:08 AM
To: pgsql-general@postgresql.org General
Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert 
database character set ?


On Oct 17, 2013, at 3:13 PM, "Huang, Suya"  wrote:

> Hi,
>  
> I've got a question of converting database from ascii to UTF-8, what's the 
> best approach to do so if the database size is very large? Detailed procedure 
> or experience sharing are much appreciated!
>  

The answer to that depends on what you mean by "ascii".

If your current database uses SQL_ASCII encoding - that's not ascii. It could 
have anything in there, including any mix of encodings and there's been no 
enforcement of any encoding, so there's no way of knowing what they are. If 
you've had, for example, webapps that let people paste word documents into 
them, you potentially have different encodings used in different rows of the 
same table.

If your current data is like that then you're probably looking at doing some 
(manual) data cleanup to work out what encoding your data is really in, and 
converting it to something consistent rather than a simple migration from ascii 
to utf8.

Cheers,
  Steve



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread Adrian Klaver

On 10/17/2013 08:51 PM, Huang, Suya wrote:

Thanks Steve,

Yes, we're using SQL_ASCII.

Would you please be more specific about manual data cleanup work here? I'm new 
to Postgres and don't have any experience in character set  conversion before, 
so any specific experience shared would be very much appreciated.


This is a good place to start:

http://www.postgresql.org/docs/9.3/interactive/multibyte.html



Thanks,
Suya



--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread Huang, Suya
Forgot to mention, we're using a very old version which is 8.3.11.

I'll take a look at the guide for 8.3 with similar section.

Thanks Adrian!


-Original Message-
From: Adrian Klaver [mailto:adrian.kla...@gmail.com] 
Sent: Friday, October 18, 2013 3:05 PM
To: Huang, Suya; Steve Atkins; pgsql-general@postgresql.org General
Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert 
database character set ?

On 10/17/2013 08:51 PM, Huang, Suya wrote:
> Thanks Steve,
>
> Yes, we're using SQL_ASCII.
>
> Would you please be more specific about manual data cleanup work here? I'm 
> new to Postgres and don't have any experience in character set  conversion 
> before, so any specific experience shared would be very much appreciated.

This is a good place to start:

http://www.postgresql.org/docs/9.3/interactive/multibyte.html

>
> Thanks,
> Suya
>

-- 
Adrian Klaver
adrian.kla...@gmail.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread Huang, Suya
Yes John, we probably will use a new database server here to accommodate those 
converted database.

By saying export/import, do you mean by :
1. pg_dump  (//should I specify -E  UTF 8 to dump the data in UTF-8 encoding?)
2. create database xxx -E UTF8
3. pg_restore

I also see someone's doing this by the following way:
1. perform a plain text dump of database. 
pg_dump -f db.sql [dbname]
2. convert the character encodings. 
iconv db.sql -f ISO-8859-1 -t UTF-8 -o db.utf8.sql
3. create the UTF8 database
createdb  utf8db  (// I'm not sure why he's not specifying DB encoding 
here, maybe better use -E to specify the encoding as UTF8)
4.restore the converted UTF8 database.
psql -d utf8db -f db.utf8.sql

which method is better? For what I can tell now is the second approach would 
generate bigger dump file size, so better to pipe it to bzip to have a 
compressed file. But other than that, any other considerations?

Thanks,
Suya
-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, October 18, 2013 11:23 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert 
database character set ?

On 10/17/2013 3:13 PM, Huang, Suya wrote:
> I've got a question of converting database from ascii to UTF-8, what's 
> the best approach to do so if the database size is very large?
> Detailed procedure or experience sharing are much appreciated!


I believe you will need to dump the whole database, and import it into a 
new database that uses UTF8 encoding. Ss far as I know, there's no way 
to convert encoding in place. As the other gentlemen pointed out, you 
also will have to convert/sanitize all text data, as your current 
SQL_ASCII fields could easily contain stuff that's not valid UTF8.

for large databases, this is a major undertaking. I find its often 
easiest to do a major change like this between the old and a new 
database server.

-- 
john r pierce  37N 122W
somewhere on the middle of the left coast



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread John R Pierce

On 10/17/2013 9:49 PM, Huang, Suya wrote:

Yes John, we probably will use a new database server here to accommodate those 
converted database.

By saying export/import, do you mean by :
1. pg_dump  (//should I specify -E  UTF 8 to dump the data in UTF-8 encoding?)
2. create database xxx -E UTF8
3. pg_restore


I don't believe 8.3 supported multiple different encodings on the same 
server instance, thats relatively new.


before you can import your SQL_ASCII data, you need to know what charset 
the data is actually in.Is it UTF8 data stored in SQL_ASCII or is it 
LATIN1 (ISO-8859) ? or some sort of Big5 or euc_cn ?  or what?  if it is 
already UTF8 data, are you sure that there are no invalid encodings 
accidentally stored?   Postgres with SQL_ASCII does no character 
validation...  if its all USASCII (0x00 to 0x7F) then you're OK.


I would strongly recommend this new database server be running a 
currently supported version, I'd probably use 9.2.   configure the old 
server to allow the postgres user on the new server to connect and log 
on, and while logged onto the new server, run something like...


pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -s 
-f olddatabase.schema.sql
pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -a 
-f olddatabase.data.sql

createuser newuser
createdb -O newuser -l en_US.utf8 newdbname
psql -d newdbname -u newuser -f olddatabase.schema.sql
psql -d newdbname -u newuser -f olddatabase.data.sql

if the data import fails due to a invalid encoding, then you may have to 
pass the .data.sql file through iconv (and remove the set 
client_encoding sql commands from it)





--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread Huang, Suya
Hi John,

" Is it UTF8 data stored in SQL_ASCII or is it 
LATIN1 (ISO-8859) ? or some sort of Big5 or euc_cn ?  or what?  if it is 
already UTF8 data, are you sure that there are no invalid encodings 
accidentally stored?   Postgres with SQL_ASCII does no character 
validation...  if its all USASCII (0x00 to 0x7F) then you're OK."

Question: How can I pull out the real character set information from the 
database? does it rely on the understanding of business knowledge?

so, the real data stored in database decides if we need to use iconv to convert 
them to UTF-8. If data is  USASCII, then the pg_dump/restore process you 
provided should be sufficient and complete. Do I understand this correctly?

Besides, we do have different  encoding database on same server instance, see 
below:

Welcome to psql 8.3.11, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help with psql commands
   \g or terminate with semicolon to execute query
   \q to quit

postgres=# \l
 List of databases
Name |  Owner   | Encoding
-+--+---
 admin | dba  | UTF8
 pgdb_1_sky  | pgdb_1  | SQL_ASCII
 pgdb_1_sky_utf8 | pgdb_1  | UTF8
 pgdb_1_ca  | pgdb_1  | SQL_ASCII
 pgdb_1_us  | pgdb_1  | SQL_ASCII
 pgdb_sky  | pgdb  | SQL_ASCII
 pgdb_sky_users| pgdb  | SQL_ASCII
 pgdb_sky_users_utf8   | pgdb  | UTF8
 pgdb_sky_utf8 | pgdb  | UTF8
 pgdb_sky_utf8_86465_old   | pgdb  | UTF8

Thanks,
Suya
-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, October 18, 2013 4:12 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert 
database character set ?

On 10/17/2013 9:49 PM, Huang, Suya wrote:
> Yes John, we probably will use a new database server here to accommodate 
> those converted database.
>
> By saying export/import, do you mean by :
> 1. pg_dump  (//should I specify -E  UTF 8 to dump the data in UTF-8 
> encoding?) 2. create database xxx -E UTF8 3. pg_restore

I don't believe 8.3 supported multiple different encodings on the same server 
instance, thats relatively new.

before you can import your SQL_ASCII data, you need to know what charset 
the data is actually in.Is it UTF8 data stored in SQL_ASCII or is it 
LATIN1 (ISO-8859) ? or some sort of Big5 or euc_cn ?  or what?  if it is 
already UTF8 data, are you sure that there are no invalid encodings 
accidentally stored?   Postgres with SQL_ASCII does no character 
validation...  if its all USASCII (0x00 to 0x7F) then you're OK.

I would strongly recommend this new database server be running a 
currently supported version, I'd probably use 9.2.   configure the old 
server to allow the postgres user on the new server to connect and log on, and 
while logged onto the new server, run something like...

 pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -s -f 
olddatabase.schema.sql
 pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -a -f 
olddatabase.data.sql
 createuser newuser
 createdb -O newuser -l en_US.utf8 newdbname
 psql -d newdbname -u newuser -f olddatabase.schema.sql
 psql -d newdbname -u newuser -f olddatabase.data.sql

if the data import fails due to a invalid encoding, then you may have to pass 
the .data.sql file through iconv (and remove the set client_encoding sql 
commands from it)




-- 
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

2013-10-17 Thread John R Pierce

On 10/17/2013 10:51 PM, Huang, Suya wrote:

Question: How can I pull out the real character set information from the 
database? does it rely on the understanding of business knowledge?



what did you store in it?   because its declared SQL_ASCII, postgres 
doesn't know, its all just bytes.   you could have stored standard 7 bit 
USASCII, or you could have stored ISO-8859-1 (-2, -3,...), or you could 
have stored one of the myriad non-UTF Asian multibyte character 
codes.postgres doesn't know or care what you put in there, and it 
doesn't check to ensure its valid.


IF all your data is in a consistent encoding, and you specify that 
encoding on the pg_dump command, then the psql command should be able to 
restore it as-is to the new UTF8 database via the magic of 
client_encoding.if the data is not consistent, you'll have a much 
harder time.






--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general