Re: [GENERAL] Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

2014-07-04 Thread Albe Laurenz
sunpeng wrote:
>>> load data to postgresql in cmd(encoding is GBK) is WIN8:
>>> 
>>> psql -h localhost  -d test -U postgres <  dbdata.sql
>>>
>>> I got the error:
>>> ERROR:  invalid byte sequence for encoding "UTF8": 0xff

>> If the encoding is GBK then you will get errors (or incorrect
>> characters) if it is read as UTF8.  Try setting the environment
>> variable PGCLIENTENCODING.
>> 
>> http://www.postgresql.org/docs/9.1/static/app-psql.html

> I‘v changed cmd (in win8) to encoding utf8 through chcp 65001, but error 
> still occurs.
> And i use the following cmd to dump mysql data:
> mysql> select Picture from personpicture where id = 
> 'F2931306D1EE44ca82394CD3BC2404D4'  into outfile
> "d:\\1.txt" ;
> I got the ansi file, and use Ultraedit to see first 16 bytes:
> FF D8 FF E0 5C 30 10 4A 46 49 46 5C 30 01 01 5C
> 
> It's different from mysql workbench to see:
> FF D8 FF E0 00 10 4a 46 49 46 00 01 01 00 00 01

Changing the terminal code page won't do anything, it's probably the data
that are in a different encoding.

I don't know enough about MySQL to know which encoding it uses when dumping 
data,
but the man page of "mysqldump" tells me:

  --set-charset
  Add SET NAMES default_character_set to the output. This option is enabled by 
default.

So is there a SET NAMES command in the dump? If yes, what is the argument?

You will have to tell PostgreSQL the encoding of the data.
As Kevin pointed out, you can do that by setting the environment variable
PGCLIENT ENCODING to the correct value.  Then PostgreSQL will convert the
data automatically.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

2014-07-04 Thread sunpeng
Thank you, friend, I use  --hex-blob :
mysqldump -v -nt --complete-insert=TRUE --compatible=postgresql
--default-character-set=utf8 --skip-add-locks --compact --no-create-info
--skip-quote-names --hex-blob -uroot -p test videorecresult >dbdata.sql
to dump mysql data.
And replace blob data "0x" into "E'\\xx'" to load data into
postgresql.




On Fri, Jul 4, 2014 at 3:27 PM, Albe Laurenz 
wrote:

> sunpeng wrote:
> >>> load data to postgresql in cmd(encoding is GBK) is WIN8:
> >>>
> >>> psql -h localhost  -d test -U postgres <  dbdata.sql
> >>>
> >>> I got the error:
> >>> ERROR:  invalid byte sequence for encoding "UTF8": 0xff
>
> >> If the encoding is GBK then you will get errors (or incorrect
> >> characters) if it is read as UTF8.  Try setting the environment
> >> variable PGCLIENTENCODING.
> >>
> >> http://www.postgresql.org/docs/9.1/static/app-psql.html
>
> > I‘v changed cmd (in win8) to encoding utf8 through chcp 65001, but error
> still occurs.
> > And i use the following cmd to dump mysql data:
> > mysql> select Picture from personpicture where id =
> 'F2931306D1EE44ca82394CD3BC2404D4'  into outfile
> > "d:\\1.txt" ;
> > I got the ansi file, and use Ultraedit to see first 16 bytes:
> > FF D8 FF E0 5C 30 10 4A 46 49 46 5C 30 01 01 5C
> >
> > It's different from mysql workbench to see:
> > FF D8 FF E0 00 10 4a 46 49 46 00 01 01 00 00 01
>
> Changing the terminal code page won't do anything, it's probably the data
> that are in a different encoding.
>
> I don't know enough about MySQL to know which encoding it uses when
> dumping data,
> but the man page of "mysqldump" tells me:
>
>   --set-charset
>   Add SET NAMES default_character_set to the output. This option is
> enabled by default.
>
> So is there a SET NAMES command in the dump? If yes, what is the argument?
>
> You will have to tell PostgreSQL the encoding of the data.
> As Kevin pointed out, you can do that by setting the environment variable
> PGCLIENT ENCODING to the correct value.  Then PostgreSQL will convert the
> data automatically.
>
> Yours,
> Laurenz Albe
>


Re: [GENERAL] Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

2014-07-04 Thread John R Pierce

On 7/4/2014 2:12 AM, sunpeng wrote:

Thank you, friend, I use  --hex-blob :
mysqldump -v -nt --complete-insert=TRUE --compatible=postgresql 
--default-character-set=utf8 --skip-add-locks --compact 
--no-create-info --skip-quote-names --hex-blob -uroot -p test 
videorecresult >dbdata.sql

to dump mysql data.
And replace blob data "0x" into "E'\\xx'" to load data into 
postgresql.


regardless of all that, 0xFF is not a valid UTF8 character code. perhaps 
you should store the data in a postgres BYTEA, or at least use character 
encoding SQLASCII (which is to say, no encoding, bytes in == bytes out.)



--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

2014-07-04 Thread Albe Laurenz
John R Pierce wrote:
> On 7/4/2014 2:12 AM, sunpeng wrote:
>> Thank you, friend, I use  --hex-blob :
>> mysqldump -v -nt --complete-insert=TRUE --compatible=postgresql
>> --default-character-set=utf8 --skip-add-locks --compact
>> --no-create-info --skip-quote-names --hex-blob -uroot -p test
>> videorecresult >dbdata.sql
>> to dump mysql data.
>> And replace blob data "0x" into "E'\\xx'" to load data into
>> postgresql.

> regardless of all that, 0xFF is not a valid UTF8 character code. perhaps
> you should store the data in a postgres BYTEA, or at least use character
> encoding SQLASCII (which is to say, no encoding, bytes in == bytes out.)

Exactly.
According to mysqldump's man page, the affected field must be
BINARY, VARBINARY, the BLOB types or BIT.
For these PostgreSQL's "bytea" would definitely be the correct data type,
and there wouldn't be any encoding problems with that.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Windows releases - Bundle OpenSSL includes and .libs in the installer?

2014-07-04 Thread Craig Ringer
On 07/03/2014 05:41 PM, Dave Page wrote:
> It shouldn't be an issue for psqlODBC itself as users of that link
> against the driver manager not the driver. I suspect what Craig is
> trying to do is build the driver against our server installation, which
> means he may need OpenSSL now, and other dependencies in the future

Exactly.

It's not vital. After all, I can just build my own libpq + openssl +
gettext etc. But I'd rather reduce divergence from the published
psqlODBC drivers if possible, by bundling the "official" libpq Windows
binaries. As psqlODBC must link directly to OpenSSL that won't work with
how they're currently distributed - or not without possible issues due
to OpenSSL library mismatches, anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Random-looking primary keys in the range 100000..999999

2014-07-04 Thread Kynn Jones
I'm looking for a way to implement pseudorandom primary keys in the range
10..99.

The randomization scheme does not need to be cryptographically strong.  As
long as it is not easy to figure out in a few minutes it's good enough.

My starting point for this is the following earlier message to this list:

http://www.postgresql.org/message-id/49f96730.4000...@postnewspapers.com.au

The answer given to it here

http://www.postgresql.org/message-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case where the
domain of the permutation has a cardinality that is not a power of 2, as it
is in my case (cardinality = 90).

---

(In the crypto world there are "format preserving encryption" techniques
that probably could do what I want to do, but their focus on cryptographic
strength makes learning and implementing them tough going, plus, the
performance will probably be poor, since high workloads are an asset for
such crypto applications.  Since cryptographic strength is not something I
need, I'm trying to find non-crypt-grade alternatives.)

Thanks in advance!

kynn


Re: [GENERAL] Random-looking primary keys in the range 100000..999999

2014-07-04 Thread hubert depesz lubaczewski
How many rows do you plan on having in this table? Why this particular key
range?

depesz


On Fri, Jul 4, 2014 at 3:24 PM, Kynn Jones  wrote:

> I'm looking for a way to implement pseudorandom primary keys in the range
> 10..99.
>
> The randomization scheme does not need to be cryptographically strong.  As
> long as it is not easy to figure out in a few minutes it's good enough.
>
> My starting point for this is the following earlier message to this list:
>
> http://www.postgresql.org/message-id/49f96730.4000...@postnewspapers.com.au
>
> The answer given to it here
>
>
> http://www.postgresql.org/message-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm
>
> ...is really cool, but I don't see how to modify it for the case where the
> domain of the permutation has a cardinality that is not a power of 2, as it
> is in my case (cardinality = 90).
>
> ---
>
> (In the crypto world there are "format preserving encryption" techniques
> that probably could do what I want to do, but their focus on cryptographic
> strength makes learning and implementing them tough going, plus, the
> performance will probably be poor, since high workloads are an asset for
> such crypto applications.  Since cryptographic strength is not something I
> need, I'm trying to find non-crypt-grade alternatives.)
>
> Thanks in advance!
>
> kynn
>
>


Re: [GENERAL] pg_dump slower than pg_restore

2014-07-04 Thread Tom Lane
David Wall  writes:
> It just seems odd that pg_dump is slower than pg_restore to me. Most 
> grumblings I read about suggest that pg_restore is too slow.

> I have noted that the last split file segment will often appear to be 
> done -- no file modifications -- while pg_dump is still running, often 
> for another 20 minutes or so, and then some last bit is finally 
> written.  It's as if pg_dump is calculating something at the end that is 
> quite slow.  At startup, there's a delay before data is written, too, 
> but it's generally 1-2 minutes at most.

You haven't given us much info about the contents of this database.
Are there a lot of tables? functions? large objects?  How many is
"a lot", if so?

I'm suspicious that you're paying a penalty associated with pg_dump's
rather inefficient handling of metadata for large objects, but there's
not enough info in this thread to diagnose it.  It'd be very interesting
to see perf or oprofile stats on the pg_dump run, particularly during
the parts where it doesn't seem to be writing anything.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Random-looking primary keys in the range 100000..999999

2014-07-04 Thread Kynn Jones
On Fri, Jul 4, 2014 at 10:13 AM, hubert depesz lubaczewski  wrote:

> How many rows do you plan on having in this table?
>

Currently, only around 10K, but there's expectation that the number will
grow.  It's hard to predict how much, hence the generous extra space.


> Why this particular key range?
>

The requirements I've been given for the keys is that they be numeric,
reasonably easy to type (hence, no 40-digit keys), never beginning with 0,
and carrying no additional information content (or even suggesting it).
Among the pieces of information that the key should not include is the
relative time of entry into the DB (hence, the keys should be more or less
evenly distributed over the 100K-1M range).

k


Re: [GENERAL] Random-looking primary keys in the range 100000..999999

2014-07-04 Thread Tom Lane
Kynn Jones  writes:
> The requirements I've been given for the keys is that they be numeric,
> reasonably easy to type (hence, no 40-digit keys), never beginning with 0,
> and carrying no additional information content (or even suggesting it).

Why not just

(random()*89)::int + 10

This is unlikely to be cryptographically secure, but you didn't say you
needed that.  You will need to check for collisions, but that seems like
something you really ought to do anyway.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_dump slower than pg_restore

2014-07-04 Thread David Wall


On 7/4/2014 7:19 AM, Tom Lane wrote:

You haven't given us much info about the contents of this database.
Are there a lot of tables? functions? large objects?  How many is
"a lot", if so?

I'm suspicious that you're paying a penalty associated with pg_dump's
rather inefficient handling of metadata for large objects, but there's
not enough info in this thread to diagnose it.  It'd be very interesting
to see perf or oprofile stats on the pg_dump run, particularly during
the parts where it doesn't seem to be writing anything.


There are only 32 table, no functions, but mostly large objects. Not 
sure how to know about the LOs, but a quick check from the table sizes I 
estimate at only 2GB, so 16GB could be LOs.  There are 7,528,803 entries 
in pg_catalog.pg_largeobject.


pg_database_size reports 18GB

biggest table sizes:
 relation  |  size
---+
 public.esf_formparty  | 635 MB
 public.esf_activity_log   | 416 MB
 public.esf_form   | 181 MB
 public.esf_encrypted_blob | 134 MB
 public.esf_activity_log_ownertime | 73 MB
 public.esf_tranfield  | 72 MB
 public.esf_formpartytranididx | 70 MB
 public.esf_formparty_pkey | 65 MB
 public.esf_encrypted_blob_pkey| 64 MB
 public.esf_formpartyididx | 63 MB
 public.esf_tranfield_pkey | 52 MB
 public.esf_formpartypickupidx | 51 MB
 public.esf_activity_log_typetime  | 47 MB
 public.esf_tran   | 46 MB
 public.esf_formorderidx   | 46 MB
 public.esf_form_pkey  | 42 MB
 public.esf_tranfieldvalueidx  | 39 MB
 public.esf_traninittimeidx| 19 MB
 public.esf_tranupdatetimeidx  | 19 MB
 public.esf_tran_pkey  | 13 MB

Basic top stats while running show:

top - 08:53:40 up 27 days, 17:38,  1 user,  load average: 1.03, 1.12, 1.22
Tasks: 156 total,   1 running, 155 sleeping,   0 stopped, 0 zombie
Cpu(s):  1.3%us,  0.6%sy,  0.4%ni, 74.2%id, 23.5%wa, 0.0%hi,  0.0%si,  
0.0%st

Mem:   3974112k total,  3954520k used,19592k free, 46012k buffers
Swap:  4245496k total,29996k used,  4215500k free, 1123844k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM TIME+  COMMAND
 7549 esignfor  20   0  116m 1372  884 S  3.0  0.0 16:39.69 gpg --batch 
--symmetric --cipher-algo AES256 --passphrase 3z4ig0Rq]w
 7547 esignfor  30  10 1148m 1.0g  852 S  2.3 26.9 14:10.27 pg_dump 
--format=c --oids ibc01

 7548 esignfor  20   0  4296  748  372 S  2.3  0.0 13:05.44 gzip
 7551 esignfor  20   0  555m 413m 410m D  1.7 10.6 9:32.03 postgres: 
esignforms ibc01 [local] 

 1978 esignfor  20   0 15032 1372 1004 R  0.7  0.0 0:00.27 top -c
 7550 esignfor  20   0 98.6m  592  472 S  0.3  0.0 0:49.80 split -b 
51200 - /home/esignforms/customers/archive/db/dump.20140704.gz.gpg




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_dump slower than pg_restore

2014-07-04 Thread Bosco Rama
On 07/03/14 22:51, David Wall wrote:
> 
> It just seems odd that pg_dump is slower than pg_restore to me. Most 
> grumblings I read about suggest that pg_restore is too slow.
> 
> I have noted that the last split file segment will often appear to be 
> done -- no file modifications -- while pg_dump is still running, often 
> for another 20 minutes or so, and then some last bit is finally 
> written.  It's as if pg_dump is calculating something at the end that is 
> quite slow.  At startup, there's a delay before data is written, too, 
> but it's generally 1-2 minutes at most.

Random thought:  What OS & kernel are you running?  Kernels between
3.2.x and 3.9.x were known to have IO scheduling issues.  This was
highlighted most by the kernel in Ubuntu 12.04 (precise) as shown
here:



Bosco.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] index contains unexpected zero page at block error

2014-07-04 Thread François Beausoleil
Hi,

I ran a query on the slave and got the error message in the subject. I rebuilt 
the slave two days ago on ZFS (with compression). I’m wondering how that 
unexpected zero page happened in the first place.

Master:
Linux kalanchoe 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux
 PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 
4.6.3-1ubuntu5) 4.6.3, 64-bit

Slave:
Linux munn 3.2.0-65-generic #98-Ubuntu SMP Wed Jun 11 20:27:07 UTC 2014 x86_64 
x86_64 x86_64 GNU/Linux
 PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 
4.6.3-1ubuntu5) 4.6.3, 64-bit

The slave is connected to the master through streaming replication. I know I’m 
not perfectly in sync, and I’ll switch the roles in a week, after a burn-in 
period for the slave.

I was running the following query on the slave:

select persona_followers_count, count(*)
from show_interaction_bindings
where market_id = 'e8639780-db2d-012e-ebb6-40400fe46aa7’
  and interaction_created_at >= '2014-06-18’
  and service_name = ‘Facebook'
  and show_id = 'ed7e4800-c370-012e-eb90-40400fe46aa7’
group by 1 order by 2 desc;

and I received this error message:

ERROR:  index "canada_2014_06_23_pkey" contains unexpected zero page at block 
135764
HINT:  Please REINDEX it.

show_interaction_bindings is a master table with partitions by market_id and 
week, such that I have 500+ child tables. bindings.2014_06_23’s schema looks 
like:


Table "bindings.canada_2014_06_23"
   Column|Type |  
Modifiers
-+-+--
 interaction_id  | uuid| not null
 show_id | uuid| not null
 service_name| character varying(8)| not null
 interaction_created_at  | timestamp without time zone | not null
 interaction_service_id  | text| not null
 interaction_text| text|
 interaction_source  | text|
 interaction_platform_name   | character varying(256)  |
 interaction_app_name| character varying(256)  |
 interaction_in_reply_to_screen_name | character varying(256)  |
 interaction_in_reply_to_status_id   | character varying(64)   |
 interaction_in_reply_to_user_id | character varying(64)   |
 interaction_language| character varying(12)   |
 interaction_retweet_count   | integer |
 interaction_retweet_id  | character varying(64)   |
 persona_service_id  | character varying(64)   | not null
 persona_screen_name | character varying(256)  | not null
 persona_name| character varying(256)  | not null
 persona_description | text|
 persona_location| text|
 persona_lang| character varying(16)   |
 persona_profile_image_url   | text|
 persona_created_at  | timestamp without time zone |
 persona_followers_count | integer | not null 
default 130
 persona_friends_count   | integer | not null 
default 130
 persona_time_zone   | text|
 persona_utc_offset  | integer |
 persona_gender  | character varying(12)   |
 persona_age_group   | character varying(12)   |
 persona_country | character varying(40)   |
 persona_region  | character varying(80)   |
 persona_city| character varying(80)   |
 market_id   | uuid|
 score   | real|
 positive_sentiment  | real| not null 
default 0.0
 negative_sentiment  | real| not null 
default 0.0
 like_or_favorite| boolean |
 binder_log  | text|
Indexes:
"canada_2014_06_23_pkey" PRIMARY KEY, btree (show_id, 
interaction_service_id, service_name)
Check constraints:
"market_and_created_at_in_partition" CHECK (market_id = 
'e8639780-db2d-012e-ebb6-40400fe46aa7'::uuid AND '2014-06-23 
07:00:00'::timestamp without time zone <= interaction_created_at AND 
interaction_created_at < '2014-06-30 07:00:00'::timestamp without time zone)
Inherits: show_interaction_bindings

I just rebuilt the slave two days ago on ZFS, from XFS.

Re: [GENERAL] pg_dump slower than pg_restore

2014-07-04 Thread Bosco Rama
On 07/03/14 22:51, David Wall wrote:
> 
> On 7/3/2014 11:47 AM, Eduardo Morras wrote:
>> No, there's nothing wrong. All transparent compressed objects stored 
>> in database, toast, lo, etc.. is transparently decompressed while 
>> pg_dump access them and then you gzip it again. I don't know why it 
>> doesn't dump the compressed data directly. 
> 
> That sounds odd, but if pg_dump decompresses the large objects and then 
> I gzip them on backup, doesn't the same more or less happen in reverse 
> when I pg_restore them?  I mean, I gunzip the backup and then pg_restore 
> must compress the large objects when it writes them back.

I believe there is a little confusion here.  (If not, I apologize in
advance :-) ).

There are at least 3 forms of (de)compression going on here:
   1) Application level:  The data itself is presented to the
  application in a compressed format (image file, etc.) or
  it is compressed by the application prior to presentation to PG.
  The fact that this data is compressed is not relevant to the
  following;

   2) PG server-side: Automated TOAST (de)compression which may or may
  not be enabled for a table/column.

   3) Dump/restore/tools:  the pg dump/restore and any tools you
  use to (de)compress the DB backup.  Without due care this can
  end up in multiple attempts to (de)compress the same file, as
  we've seen. :-)

Regardless of what happens in 2) & 3) the application-level compression
state in 1) will not be affected.

Thus a blob that represents a gzip'd file presented to PG in 1) will
remain compressed through all 3 stages.  It is *not* auto-magically
uncompressed at any stage.

I believe Eduardo was referring to 2) which, because it is an
automated, transparent, server-side operation needs to be performed
even as part of a dump/restore.

HTH,
Bosco.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query "top 10 and others"

2014-07-04 Thread Edson Richter
I would like to construct a query, to be used in graphics (Pie Chart, to be 
more precise), and to avoid having 500 slices, I would like to make a query 
that returns the top ten, and then all the rest summed.

I know I can do it by using some repetition, like:

a) Assume "places" table with population with structure

create table places (
  id as integer primary key,
  country as varchar(3),
  state as varchar(50),
  city as varchar(50),
  population integer
)

b) There are not so many records in table (my country have ~5500 cities, and 27 
states), and initially, there will be only 1 country.

with QRY as (select C1.country, C1.state, sum(C1.population)
  from places C1
  group by 1, 2
   order by 3 DESC
  limit 10)

select * from QRY
union
select 'others' as "country", '' as "state", sum(population)
  from places
 where not exists (select 1 from QRY where country = QRY.country and state = 
QRY.state)


Can I simplify the query by using some sort of window function or other 
PostgreSQL feature I don't know yet?


Thanks,

Edson Richter

  

Re: [GENERAL] Query "top 10 and others"

2014-07-04 Thread David G Johnston
Edson Richter wrote
> I would like to construct a query, to be used in graphics (Pie Chart, to
> be more precise), and to avoid having 500 slices, I would like to make a
> query that returns the top ten, and then all the rest summed.
> 
> I know I can do it by using some repetition, like:
> 
> a) Assume "places" table with population with structure
> 
> create table places (
>   id as integer primary key,
>   country as varchar(3),
>   state as varchar(50),
>   city as varchar(50),
>   population integer
> )
> 
> b) There are not so many records in table (my country have ~5500 cities,
> and 27 states), and initially, there will be only 1 country.
> 
> with QRY as (select C1.country, C1.state, sum(C1.population)
>   from places C1
>   group by 1, 2
>order by 3 DESC
>   limit 10)
> 
> select * from QRY
> union
> select 'others' as "country", '' as "state", sum(population)
>   from places
>  where not exists (select 1 from QRY where country = QRY.country and state
> = QRY.state)
> 
> 
> Can I simplify the query by using some sort of window function or other
> PostgreSQL feature I don't know yet?

This is the best solution; though it may be faster to calculate all the sums
in the CTE then limit 10 the first union part and sum the remaining sums in
the second part.  That way you do not have to scan the entire places table
twice.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Query-top-10-and-others-tp5810597p5810601.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re : [GENERAL] Query "top 10 and others"

2014-07-04 Thread Edson Richter
Would you please provide an example, even hypothetical?  Atenciosamente,Edson Richter -- Mensagem original --De: David G Johnston Data: 04/07/2014 19h22Para: pgsql-general@postgresql.org;Assunto:Re: [GENERAL] Query "top 10 and others"Edson Richter wrote
> I would like to construct a query, to be used in graphics (Pie Chart, to
> be more precise), and to avoid having 500 slices, I would like to make a
> query that returns the top ten, and then all the rest summed.
> 
> I know I can do it by using some repetition, like:
> 
> a) Assume "places" table with population with structure
> 
> create table places (
>   id as integer primary key,
>   country as varchar(3),
>   state as varchar(50),
>   city as varchar(50),
>   population integer
> )
> 
> b) There are not so many records in table (my country have ~5500 cities,
> and 27 states), and initially, there will be only 1 country.
> 
> with QRY as (select C1.country, C1.state, sum(C1.population)
>   from places C1
>   group by 1, 2
>    order by 3 DESC
>   limit 10)
> 
> select * from QRY
> union
> select 'others' as "country", '' as "state", sum(population)
>   from places
>  where not exists (select 1 from QRY where country = QRY.country and state
> = QRY.state)
> 
> 
> Can I simplify the query by using some sort of window function or other
> PostgreSQL feature I don't know yet?

This is the best solution; though it may be faster to calculate all the sums
in the CTE then limit 10 the first union part and sum the remaining sums in
the second part.  That way you do not have to scan the entire places table
twice.

David J.






--
View this message in context: http://postgresql.1045698.n5.nabble.com/Query-top-10-and-others-tp5810597p5810601.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Re : [GENERAL] Query "top 10 and others"

2014-07-04 Thread David Johnston
>
> > with QRY as (select C1.country, C1.state, sum(C1.population)
> >   from places C1
> >   group by 1, 2
> >order by 3 DESC
> >   limit 10)
> >
> > select * from QRY
> > union
> > select 'others' as "country", '' as "state", sum(population)
> >   from places
> >  where not exists (select 1 from QRY where country = QRY.country and state
> > = QRY.state)
> >
>
> (not tested)

​with QRY as ( SELECT country, state, sum(population) as st_pop FROM places
GROUP BY country, state )
, u1 AS ​( SELECT country, state, st_pop FROM QRY ORDER BY st_pop DESC
LIMIT 10 )
, u2 AS ( SELECT 'other' AS country, '' AS state, sum(st_pop) FROM QRY
WHERE NOT EXISTS (
SELECT 1 FROM u1 WHERE (QRY.country, QRY.state) = (u1.country, u1,state)
)
SELECT * FROM u1
UNION ALL
SELECT * FROM u2
;
David J.


[GENERAL] memory leak while trying to update/alter column in postgresql

2014-07-04 Thread Madhurima Das
I am trying to update/alter two columns-comment and id in a table called
xxx using postgresql. Though the compiler does not show any errors but
Valgrind gives major memory leak errors. The code is:

int main()
{
PGconn *conn;
PGresult *res;
int i=0,nFields=0,row=0,col=0;

conn = PQconnectdb("dbname=test1 host=localhost user=postgres
password=yyy");
if(PQstatus(conn) == CONNECTION_BAD)
{
fprintf(stderr, "Connection to database \"%s\" failed.\n",
PQerrorMessage(conn));
fprintf(stderr, "%s", PQerrorMessage(conn));
exit_nicely(conn);
}

res = PQexec(conn, "IF COL_LENGTH('xxx','comment') IS NULL");
if(res)
res = PQexec(conn, "ALTER TABLE xxx ADD comment VARCHAR(500)");
else
res = PQexec(conn, "UPDATE TABLE xxx ADD comment VARCHAR(500)");

res = PQexec(conn, "IF COL_LENGTH('xxx','id') IS NULL");
if(res)
res = PQexec(conn, "ALTER TABLE xxx ADD id VARCHAR(50)");
else
res = PQexec(conn, "UPDATE TABLE xxx ADD id VARCHAR(50)");

  res = PQexec(conn, "SELECT * FROM xxx");
if((!res) || (PQresultStatus(res) != PGRES_TUPLES_OK))
{
fprintf(stderr, "SELECT command did not return tuples properly\n");
PQclear(res);
}

PQclear(res);
PQfinish(conn);

return 0;
}

Valgrind gives the following errors:

$ valgrind --track-origins=yes --leak-check=full ./Gwidd_uniprot_map2 xaa

==4525== Memcheck, a memory error detector
==4525== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==4525== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==4525== Command: ./Gwidd_uniprot_map2 xaa
==4525==
--4525-- ./Gwidd_uniprot_map2:
--4525-- dSYM directory has wrong UUID; consider using --dsymutil=yes
==4525==
==4525== HEAP SUMMARY:
==4525== in use at exit: 262,994 bytes in 751 blocks
==4525==   total heap usage: 1,012 allocs, 261 frees, 345,158 bytes
allocated
==4525==
==4525== 2,248 (200 direct, 2,048 indirect) bytes in 1 blocks are
definitely lost in loss record 400 of 414
==4525==at 0x100011345: malloc (vg_replace_malloc.c:236)
==4525==by 0x10007B7E0: PQmakeEmptyPGresult (in /usr/lib/libpq.5.dylib)
==4525==by 0x10008DF30: pqGetErrorNotice3 (in /usr/lib/libpq.5.dylib)
==4525==by 0x10008CC00: pqParseInput3 (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007ECD9: parseInput (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007EE3F: PQgetResult (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007F6F1: PQexecFinish (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007F1DC: PQexec (in /usr/lib/libpq.5.dylib)
==4525==by 0x1199A: main (in ./Gwidd_uniprot_map2)
==4525==
==4525== 2,248 (200 direct, 2,048 indirect) bytes in 1 blocks are
definitely lost in loss record 401 of 414
==4525==at 0x100011345: malloc (vg_replace_malloc.c:236)
==4525==by 0x10007B7E0: PQmakeEmptyPGresult (in /usr/lib/libpq.5.dylib)
==4525==by 0x10008DF30: pqGetErrorNotice3 (in /usr/lib/libpq.5.dylib)
==4525==by 0x10008CC00: pqParseInput3 (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007ECD9: parseInput (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007EE3F: PQgetResult (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007F6F1: PQexecFinish (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007F1DC: PQexec (in /usr/lib/libpq.5.dylib)
==4525==by 0x119B5: main (in ./Gwidd_uniprot_map2)
==4525==
==4525== 2,248 (200 direct, 2,048 indirect) bytes in 1 blocks are
definitely lost in loss record 402 of 414
==4525==at 0x100011345: malloc (vg_replace_malloc.c:236)
==4525==by 0x10007B7E0: PQmakeEmptyPGresult (in /usr/lib/libpq.5.dylib)
==4525==by 0x10008DF30: pqGetErrorNotice3 (in /usr/lib/libpq.5.dylib)
==4525==by 0x10008CC00: pqParseInput3 (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007ECD9: parseInput (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007EE3F: PQgetResult (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007F6F1: PQexecFinish (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007F1DC: PQexec (in /usr/lib/libpq.5.dylib)
==4525==by 0x119DF: main (in ./Gwidd_uniprot_map2)
==4525==
==4525== 2,248 (200 direct, 2,048 indirect) bytes in 1 blocks are
definitely lost in loss record 403 of 414
==4525==at 0x100011345: malloc (vg_replace_malloc.c:236)
==4525==by 0x10007B7E0: PQmakeEmptyPGresult (in /usr/lib/libpq.5.dylib)
==4525==by 0x10008DF30: pqGetErrorNotice3 (in /usr/lib/libpq.5.dylib)
==4525==by 0x10008CC00: pqParseInput3 (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007ECD9: parseInput (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007EE3F: PQgetResult (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007F6F1: PQexecFinish (in /usr/lib/libpq.5.dylib)
==4525==by 0x10007F1DC: PQexec (in /usr/lib/libpq.5.dylib)
==4525==by 0x119FA: main (in ./Gwidd_uniprot_map2)
==4525==
==4525== LEAK SUMMARY:
==4525==definitely lost: 800 bytes in 4 blocks
==4525==indirectly lost: 8,192 bytes in 4 blocks
==4525==  possibly lost: 0 bytes in 0 blocks
==4525==still reachable: 254,002 bytes in 743 blocks
==4525== suppressed: 0 bytes in 0 blocks
==4

Re: [GENERAL] Random-looking primary keys in the range 100000..999999

2014-07-04 Thread Gavin Flower

On 05/07/14 01:24, Kynn Jones wrote:
I'm looking for a way to implement pseudorandom primary keys in the 
range 10..99.


The randomization scheme does not need to be cryptographically 
strong.  As long as it is not easy to figure out in a few minutes it's 
good enough.


My starting point for this is the following earlier message to this list:

http://www.postgresql.org/message-id/49f96730.4000...@postnewspapers.com.au

The answer given to it here

http://www.postgresql.org/message-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case where 
the domain of the permutation has a cardinality that is not a power of 
2, as it is in my case (cardinality = 90).


---

(In the crypto world there are "format preserving encryption" 
techniques that probably could do what I want to do, but their focus 
on cryptographic strength makes learning and implementing them tough 
going, plus, the performance will probably be poor, since high 
workloads are an asset for such crypto applications.  Since 
cryptographic strength is not something I need, I'm trying to find 
non-crypt-grade alternatives.)


Thanks in advance!

kynn


Hi Kynn,

How about  (note that 'payload' could be any set of valid columns):

   -- using a crude Linear Congruential Generator
   -- not very random, but does NOT create duplicates


   DROP TABLE IF EXISTS rtab;
   DROP SEQUENCE IF EXISTS rseq;

   CREATE SEQUENCE rseq;

   CREATE TABLE rtab
   (
id int PRIMARY KEY default(10 + (nextval('rseq') * 543537 +
   997) % 90),
payload int NOT NULL
   );

   INSERT INTO rtab (payload) VALUES (generate_series(1, 10));

   TABLE rtab;

Sample output:

   id   | payload
   +-
 644534 |   1
 288071 |   2
 831608 |   3
 475145 |   4
 118682 |   5
 662219 |   6
 305756 |   7
 849293 |   8
 492830 |   9
 136367 | 10
 679904 | 11
 323441 | 12
 866978 | 13
 510515 | 14
 154052 | 15
 697589 | 16
 341126 | 17
 884663 | 18
 528200 | 19
 171737 | 20



Cheers,
Gavin


Re: [GENERAL] Random-looking primary keys in the range 100000..999999

2014-07-04 Thread Gavin Flower

On 05/07/14 15:48, Gavin Flower wrote:

On 05/07/14 01:24, Kynn Jones wrote:
I'm looking for a way to implement pseudorandom primary keys in the 
range 10..99.


The randomization scheme does not need to be cryptographically 
strong.  As long as it is not easy to figure out in a few minutes 
it's good enough.


My starting point for this is the following earlier message to this list:

http://www.postgresql.org/message-id/49f96730.4000...@postnewspapers.com.au

The answer given to it here

http://www.postgresql.org/message-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case 
where the domain of the permutation has a cardinality that is not a 
power of 2, as it is in my case (cardinality = 90).


---

(In the crypto world there are "format preserving encryption" 
techniques that probably could do what I want to do, but their focus 
on cryptographic strength makes learning and implementing them tough 
going, plus, the performance will probably be poor, since high 
workloads are an asset for such crypto applications.  Since 
cryptographic strength is not something I need, I'm trying to find 
non-crypt-grade alternatives.)


Thanks in advance!

kynn


Hi Kynn,

How about  (note that 'payload' could be any set of valid columns):

-- using a crude Linear Congruential Generator
-- not very random, but does NOT create duplicates


DROP TABLE IF EXISTS rtab;
DROP SEQUENCE IF EXISTS rseq;

CREATE SEQUENCE rseq;

CREATE TABLE rtab
(
id int PRIMARY KEY default(10 + (nextval('rseq') * 543537
+ 997) % 90),
payload int NOT NULL
);

INSERT INTO rtab (payload) VALUES (generate_series(1, 10));

TABLE rtab;

Sample output:

id   | payload
+-
 644534 |   1
 288071 |   2
 831608 |   3
 475145 |   4
 118682 |   5
 662219 |   6
 305756 |   7
 849293 |   8
 492830 |   9
 136367 |  10
 679904 |  11
 323441 |  12
 866978 |  13
 510515 |  14
 154052 |  15
 697589 |  16
 341126 |  17
 884663 |  18
 528200 |  19
 171737 |  20



Cheers,
Gavin

Hmm...

for a 10 times larger range
id   int PRIMARY KEY default(100 + (nextval('rseq') * 
543537 + 997) % 900),

also works!



Re: [GENERAL] pg_dump slower than pg_restore

2014-07-04 Thread Tom Lane
David Wall  writes:
> On 7/4/2014 7:19 AM, Tom Lane wrote:
>> You haven't given us much info about the contents of this database.
>> Are there a lot of tables? functions? large objects?  How many is
>> "a lot", if so?

> There are only 32 table, no functions, but mostly large objects. Not 
> sure how to know about the LOs, but a quick check from the table sizes I 
> estimate at only 2GB, so 16GB could be LOs.  There are 7,528,803 entries 
> in pg_catalog.pg_largeobject.

Hmm ... how many rows in pg_largeobject_metadata?

> Basic top stats while running show:

>   7547 esignfor  30  10 1148m 1.0g  852 S  2.3 26.9 14:10.27 pg_dump 
> --format=c --oids ibc01

That's a pretty large resident size for pg_dump :-( ... you evidently
have a lot of objects of some sort, and I'm betting it's LOs, but
let's make sure.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general