Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. How to compact it?

2012-03-12 Thread Nur Hidayat
Hai Aleksey,

I once have the same problem. In my case it's because most of my table
using text datatype.
When I change the field type to character varying (1000) database size
reduced significantly

Unfortunately, I haven't investigate more, but it looks like how postgres
stores data

Regards,
Nur Hidayat





On Mon, Mar 12, 2012 at 1:32 PM, Aleksey Tsalolikhin <
atsaloli.t...@gmail.com> wrote:

> Dear Scott,
>
>  When I pg_dump -t bigtablename on the Slony slave, the dump file is
> 212G in size.
>
>  I am unable to perform the same test on the master until I get a
> maintenance window, which may not be for a few weeks, as it does
> impact our production system when we dump from the master (the web app
> gets noticeably slower).
>
>  I compare this 212 GB size (logical size) with the 550 GB reported
> size (physical size on disk), which corresponds to / aligns with "du
> -sh" output, 550 GB.
>
>  I remember now I had a similar issue about a year and a half or so,
> when we had a jump in database size, also with replication involved,
> and one site became bigger than the other.  I talked to Bruce M. about
> it at some conference and he suggested looking at his site, where he
> has explanation of how Postgres stores data, low-level, like structure
> of a page and so on.  Unfortunately I was unable to carve out the time
> to drill into it then, just continued running with the larger database
> size...  so now this issue is coming back to haunt me, even bigger
> now.
>
>  The size had doubled earlier, and now it has nearly tripled.
>
>  I'm afraid the easiest (quickest) solution will be for me to destroy
> the RAID 1E array and rebuild it as a RAID 5 array, which would give
> me a bigger filesystem, buying me time to study up on what Bruce
> suggested, or else to hire a professional Postgres consultant (if
> $WORK ever coughs up the money).
>
>  Our resident Oracle DBA expert (since we don't have a Postgres one)
> suggested I try truncating the table on the slave (with replication
> down) and then restoring it from this pg_dump, just to see if the size
> of the new table will be 200 GB or 500 GB.  If 200, we're home free;
> if 500, we need to continue to investigate.
>
>  In the meantime, I owe you the size of the bigtable from
> production...   if anybody has any other suggestions, I am all ears.
>
> Yours very truly,
> Aleksey
>
>
> On 3/8/12, Scott Marlowe  wrote:
> > On Thu, Mar 8, 2012 at 1:10 PM, Aleksey Tsalolikhin
> >  wrote:
> >>> On Thu, Mar 8, 2012 at 12:10 PM, Aleksey Tsalolikhin
> >>>  wrote:
>  On Wed, Mar 7, 2012 at 10:05 AM, Aleksey Tsalolikhin
>   wrote:
> >  We're replicating a PostgreSQL 8.4.x database using Slony1-1.2.x
> >
> >
> >  My biggest table measures 154 GB on the origin, and 533 GB on
> >  the slave.
> >
> >  Why is my slave bigger than my master?  How can I compact it,
> please?
> 
> 
>  On Wed, Mar 7, 2012 at 11:54 PM, Stuart Bishop
>   wrote back:
> >
> > Do you have a long running transaction on the slave? vacuum will not
> > reuse space that was freed after the longest running transaction.
> >
> > You need to use the CLUSTER command to compact it, or VACUUM FULL
> > followed by a REINDEX if you don't have enough disk space to run
> > CLUSTER. And neither of these will do anything if the space is still
> > live because some old transaction might still need to access the old
> > tuples.
> 
>  Dear Stuart,
> 
>   We do not run any transactions on the slave besides we pg_dump the
>  entire database every 3 hours.  I don't have enough disk space to
>  CLUSTER
>  the table; I ran VACUUM FULL yesterday, and I just fired up a REINDEX
>  TABLE.
> 
>   I'd love to get some insight into how much logical data I have versus
>  how
>  much physical space it is taking up.  Is there some admin tool or
>  command
>  or query that will report that?  For each table (and index), I'd like
>  to know how
>  much data is in that object (logical data size) and how much space it
> is
>  taking
>  up on disk (physical data size).
> >>
> >>
> >> On Thu, Mar 8, 2012 at 11:58 AM, Scott Marlowe  >
> >> wrote:
> >>> Do you do things like truncate on the master?  Cause truncates don't
> >>> get replicated in slony.
> >>
> >>
> >> Dear Scott,
> >>
> >>  No, we do not truncate this table on the master.  We only add to it.
> >>
> >>  The REINDEX FULL completed and the table is still swollen.
> >
> > If you pg_dump -t tablename from each machine, are the backups about
> > the same size?
> >
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. How to compact it?

2012-03-12 Thread John R Pierce

On 03/12/12 12:06 AM, Nur Hidayat wrote:


I once have the same problem. In my case it's because most of my table 
using text datatype.
When I change the field type to character varying (1000) database size 
reduced significantly


Unfortunately, I haven't investigate more, but it looks like how 
postgres stores data


that doesn't make any sense.   text and character varying storage is 
exactly hte same, the only difference is the varchar has an optional 
length constraint




--
john r pierceN 37, W 122
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. How to compact it?

2012-03-12 Thread Nur Hidayat
Yes, I am aware of that, but that's the fact I'm facing
Right now I'am happy enough my system runs well without eating up my drive :)
I'll investigate more later when time available :)

Cheers,
Nur Hidayat
 


.::.
Sent from my BlackBerry®
powered by The ESQ Way 165

-Original Message-
From: John R Pierce 
Sender: pgsql-general-owner@postgresql.orgDate: Mon, 12 Mar 2012 00:39:28 
To: 
Subject: Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave.
 How to compact it?

On 03/12/12 12:06 AM, Nur Hidayat wrote:
>
> I once have the same problem. In my case it's because most of my table 
> using text datatype.
> When I change the field type to character varying (1000) database size 
> reduced significantly
>
> Unfortunately, I haven't investigate more, but it looks like how 
> postgres stores data

that doesn't make any sense.   text and character varying storage is 
exactly hte same, the only difference is the varchar has an optional 
length constraint



-- 
john r pierceN 37, W 122
santa cruz ca mid-left coast


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. How to compact it?

2012-03-12 Thread Nur Hidayat
FYI, after I changed text field into character varying, I vaccuum the whole 
database, resulting in much smaller database size

Cheers,
NH



.::.
Sent from my BlackBerry®
powered by The ESQ Way 165

-Original Message-
From: "Nur Hidayat" 
Date: Mon, 12 Mar 2012 08:18:09 
To: John R Pierce; 
Reply-To: hidayat...@gmail.com
Subject: Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. How 
to compact it?

Yes, I am aware of that, but that's the fact I'm facing
Right now I'am happy enough my system runs well without eating up my drive :)
I'll investigate more later when time available :)

Cheers,
Nur Hidayat
 


.::.
Sent from my BlackBerry®
powered by The ESQ Way 165

-Original Message-
From: John R Pierce 
Sender: pgsql-general-owner@postgresql.orgDate: Mon, 12 Mar 2012 00:39:28 
To: 
Subject: Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave.
 How to compact it?

On 03/12/12 12:06 AM, Nur Hidayat wrote:
>
> I once have the same problem. In my case it's because most of my table 
> using text datatype.
> When I change the field type to character varying (1000) database size 
> reduced significantly
>
> Unfortunately, I haven't investigate more, but it looks like how 
> postgres stores data

that doesn't make any sense.   text and character varying storage is 
exactly hte same, the only difference is the varchar has an optional 
length constraint



-- 
john r pierceN 37, W 122
santa cruz ca mid-left coast


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Call for Google Summer of Code (GSoC) 2012: Project ideas?

2012-03-12 Thread Albe Laurenz
Selena Deckelmann wrote:
> On Thursday, March 8, 2012 at 11:40 AM, Stefan Keller wrote:
>> I do have a student who is interested in participating at the Google
>> Summer of Code (GSoC) 2012
>> Now I have the "burden" to look for a cool project... Any ideas?
> 
> Also those who are on this thread, we are collecting ideas on the wiki:
> 
> http://wiki.postgresql.org/wiki/GSoC_2012

I have added Foreign Data Wrappers.
I think that would be a good idea for anybody who wants a clearly
defined project - the API is (currently changing but) documented,
it's a good opportunity to learn hacking PostgreSQL server code,
and you can leverage your knowledge of other software.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread mgould
Tom,

We made most of our text, varchar columns citext data types so that we
could do case insensitive searches.  Is this going to negate most of the
index searches?  It appeared to our DBA that it would be easier to use
citext data type then need to use ILIKE instead?  

Michael Gould
Intermodal Software Solutions, LLC
904-226-0978
 
 
 Original Message 
Subject: Re: [GENERAL] full text search and ILIKE type clauses.
From: Tom Lane 
Date: Sun, March 11, 2012 7:56 pm
To: Tim Uckun 
Cc: pgsql-general 

Tim Uckun  writes:
> I want to be able to search a lot of fields using queries that use
> ILIKE and unfortunately many of the queries will be using the
> '%SOMETHING%' or '%SOMETHING' type clauses. Since indexes are useless
> on those I was thinking I could use tsvectors but I can't figure out
> how to accomplish this.

Full text search is not going to help for this unless you are willing to
be very lax about replicating the semantics of ILIKE. For example,
ILIKE '%foo%' should match "foo" anywhere within a word, but FTS is not
going to be able to do better than finding words that begin with "foo".

If you're using 9.1, you might look into contrib/pg_trgm instead.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Error installing postgresq91-python package

2012-03-12 Thread Martin Gregorie
On Sat, 2012-03-10 at 14:08 -0800, PgSQL wrote:
> >>>If the OP can identify and download the relevant .rpm file
> 
Have you checked the CentOS bugzilla to see if the yum failure has been
reported? You should raise a bug report if it isn't there: if nobody
reports a bug its unlikely to get fixed.

> 
> I download this packages:
> 
When you download rpms manually, you use the rpm command to install
the rpm packages, not yum.

> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-0.137-3.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-libs-0.137-3.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-libelf-0.137-3.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-libs-0.137-3.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/expat-1.95.8-8.3.el5_5.3.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/gmp-4.1.4-10.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/libxml2-2.6.26-2.1.12.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/libxml2-python-2.6.26-2.1.12.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/m2crypto-0.16-8.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-2.4.3-44.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-elementtree-1.2.6-5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-iniparse-0.2.3-4.el5.noarch.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-sqlite-1.1.7-1.2.1.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-urlgrabber-3.1.0-6.el5.noarch.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/readline-5.1-3.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/rpm-4.4.2.3-22.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/rpm-libs-4.4.2.3-22.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/rpm-python-4.4.2.3-22.el5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/sqlite-3.3.6-5.i386.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-3.2.22-37.el5.centos.noarch.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-NetworkManager-dispatcher-1.1.16-16.el5.centos.noarch.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-fastestmirror-1.1.16-16.el5.centos.noarch.rpm
> wget 
> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-metadata-parser-1.1.2-3.el5.centos.i386.rpm
> 
Why did you download these? Only five of them appear to have anything to
do with Python and four to be connected with yum.  

Which, if any, of them contain mx? 

Hint: try using "rpm -qip filename.rpm" or "rpm -qilp filename.rpm" to
see what is in them. 

In my Fedora15 installation the mx package is in the mx and mx-devel
packages: running "yum info '*mx*'" told me that.


Martin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Error installing postgresq91-python package

2012-03-12 Thread PgSQL
Thanks Martin.

Sure!, I download this packages to same folder, but I used:

rpm -Uvh *.rpm

to install all this packages.

PD:
mi ingles es pobre, mi lenguaje es español.

Gracias

2012/3/12, Martin Gregorie-2 [via PostgreSQL]
:
>
>
> On Sat, 2012-03-10 at 14:08 -0800, PgSQL wrote:
>> >>>If the OP can identify and download the relevant .rpm file
>>
> Have you checked the CentOS bugzilla to see if the yum failure has been
> reported? You should raise a bug report if it isn't there: if nobody
> reports a bug its unlikely to get fixed.
>
>>
>> I download this packages:
>>
> When you download rpms manually, you use the rpm command to install
> the rpm packages, not yum.
>
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-0.137-3.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-libs-0.137-3.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-libelf-0.137-3.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-libs-0.137-3.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/expat-1.95.8-8.3.el5_5.3.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/gmp-4.1.4-10.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/libxml2-2.6.26-2.1.12.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/libxml2-python-2.6.26-2.1.12.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/m2crypto-0.16-8.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-2.4.3-44.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-elementtree-1.2.6-5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-iniparse-0.2.3-4.el5.noarch.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-sqlite-1.1.7-1.2.1.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-urlgrabber-3.1.0-6.el5.noarch.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/readline-5.1-3.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/rpm-4.4.2.3-22.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/rpm-libs-4.4.2.3-22.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/rpm-python-4.4.2.3-22.el5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/sqlite-3.3.6-5.i386.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-3.2.22-37.el5.centos.noarch.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-NetworkManager-dispatcher-1.1.16-16.el5.centos.noarch.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-fastestmirror-1.1.16-16.el5.centos.noarch.rpm
>> wget
>> http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-metadata-parser-1.1.2-3.el5.centos.i386.rpm
>>
> Why did you download these? Only five of them appear to have anything to
> do with Python and four to be connected with yum.
>
> Which, if any, of them contain mx?
>
> Hint: try using "rpm -qip filename.rpm" or "rpm -qilp filename.rpm" to
> see what is in them.
>
> In my Fedora15 installation the mx package is in the mx and mx-devel
> packages: running "yum info '*mx*'" told me that.
>
>
> Martin
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
> ___
> If you reply to this email, your message will be added to the discussion
> below:
> http://postgresql.1045698.n5.nabble.com/Error-installing-postgresq91-python-package-tp5521831p5557665.html
>
> To unsubscribe from Error installing postgresq91-python package, visit
> http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5521831&code=Z2x1cGlhZG9AZ21haWwuY29tfDU1MjE4MzF8MTI1NjAwMzI0MA==


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Error-installing-postgresq91-python-package-tp5521831p5557909.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. How to compact it?

2012-03-12 Thread Alban Hertroys
On 12 March 2012 09:20, Nur Hidayat  wrote:
> FYI, after I changed text field into character varying, I vaccuum the whole 
> database, resulting in much smaller database size

What I think that happened in your case is that because of the
data-type change every row in the table got rewritten to a new version
where said column was of the new type. The subsequent vacuum then
removed the old (bloated) rows with the old type from the database
file.

And thus you ended up with a clean table.

> -Original Message-
> From: "Nur Hidayat" 
> Date: Mon, 12 Mar 2012 08:18:09
> To: John R Pierce; 
> Reply-To: hidayat...@gmail.com
> Subject: Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. 
> How to compact it?
>
> Yes, I am aware of that, but that's the fact I'm facing
> Right now I'am happy enough my system runs well without eating up my drive :)
> I'll investigate more later when time available :)
>
> Cheers,
> Nur Hidayat
>
>
>
> .::.
> Sent from my BlackBerry®
> powered by The ESQ Way 165
>
> -Original Message-
> From: John R Pierce 
> Sender: pgsql-general-owner@postgresql.orgDate: Mon, 12 Mar 2012 00:39:28
> To: 
> Subject: Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave.
>  How to compact it?
>
> On 03/12/12 12:06 AM, Nur Hidayat wrote:
>>
>> I once have the same problem. In my case it's because most of my table
>> using text datatype.
>> When I change the field type to character varying (1000) database size
>> reduced significantly
>>
>> Unfortunately, I haven't investigate more, but it looks like how
>> postgres stores data
>
> that doesn't make any sense.   text and character varying storage is
> exactly hte same, the only difference is the varchar has an optional
> length constraint
>
>
>
> --
> john r pierce                            N 37, W 122
> santa cruz ca                         mid-left coast
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] fsync default setting and version

2012-03-12 Thread Ivan Sergio Borgonovo
Where can I check in which version the default setting for fsync was
changed?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Return keys for values with dot-separation in joined statements

2012-03-12 Thread Alexander Reichstadt
Hi,

writing a client frontend I started with mysql and migrated to postgres. Now I 
found out that there is a difference in the way joined queries are returned:

The query

SELECT 
persons.id,persons.lastname,persons.firstname,persons.salutation,persons.title,addresses.address1,addresses.address2,addresses.city,addresses.zip
 FROM persons LEFT JOIN addresses_reference ON 
persons.id=addresses_reference.refid_persons LEFT JOIN addresses ON 
addresses_reference.refid_addresses=addresses.id   LIMIT 106 OFFSET 0

The response from postgres in the API:

address1 = "";
address2 = "";
city = "";
firstname = Olaf;
id = 5;
lastname = Taschenbier;
salutation = Frau;
title = "";
zip = "";

The response from mysql:

persons.address1 = "";
persons.address2 = "";
persons.city = "";
persons.firstname = Olaf;
persons.id = 5;
persons.lastname = Taschenbier;
persons.salutation = Frau;
persons.title = "";
persons.zip = "";


This is especially an issue in cases where two tables do have identical 
fieldnames. What happens in such cases?

Is there a way or some setting on postgres server to tell postgres to use the 
fieldnames exactly as provided in the select?

Thanks
Alex


Re: [GENERAL] Return keys for values with dot-separation in joined statements

2012-03-12 Thread Tom Lane
Alexander Reichstadt  writes:
> Is there a way or some setting on postgres server to tell postgres to use the 
> fieldnames exactly as provided in the select?

You didn't say exactly which "API" you're using, but in libpq you could
look at the PQftable and PQftablecol column property inquiry functions.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Return keys for values with dot-separation in joined statements

2012-03-12 Thread Alexander Reichstadt
PGSQLKit for Cocoa.


Am 12.03.2012 um 18:09 schrieb Tom Lane:

> Alexander Reichstadt  writes:
>> Is there a way or some setting on postgres server to tell postgres to use 
>> the fieldnames exactly as provided in the select?
> 
> You didn't say exactly which "API" you're using, but in libpq you could
> look at the PQftable and PQftablecol column property inquiry functions.
> 
>   regards, tom lane
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Upgrade questions

2012-03-12 Thread Carson Gross
Hello All,

I've looked through the docs, but I'm unable to find complete answers to my
questions, so thanks in advance if you can lend any expertise.

Here's the situation I'm in (always a good opener, right? :) ):

We've got a postgres database with *a lot* of data in one table.  On the
order of 100 million rows at this point.  Postgres is, of course, handling
it with aplomb.

However, when the engineer who was setting up our schema got things in
place, he neglected to think through how many entries we might have
eventually and went with the default value for the 'id' column in this
database (yes, we are a rails shop, no, he shouldn't have been allowed to
design the schema.)

This is 'integer' which, in my understanding, defaults to 32 bit.  Sadly,
this is not gonna get it done: we will hit that limit some time next year,
depending on growth.

OK, simple enough, just run something like this:

  ALTER TABLE my_table ALTER COLUMN id TYPE bigint;

However, given the size of this table, I have no idea how long something
like this might take.  In general I've had a tough time getting feedback
from postgres on the progress of a query, how long something might take,
etc.

So my question is: is there a way to understand roughly how long something
like this might take?  Our DB is out on crappy Amazon ec2 instances, so we
don't exactly have screamers set up.  Any tools I can use?  Any tips?  I
don't need anything exact, just on the order of minutes, hours, days or
weeks.

Again, thanks in advance,
Carson


[GENERAL] GROUP BY or alternative means to group

2012-03-12 Thread Alexander Reichstadt
Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an 
aggregate function

I guess I am doing something wrong. I read the web answers, but none of them 
seem to meet my needs:

SELECT 
companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
 FROM companies JOIN addresses_reference ON 
companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON 
addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;


What I did now was create a view based on above statement but without grouping. 
This returns a list with non-distinct values for all companies that have more 
than one address, which is correct. But in some cases I only need one address 
and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the 
first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex


Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread Tim Uckun
>
> We made most of our text, varchar columns citext data types so that we
> could do case insensitive searches.  Is this going to negate most of the
> index searches?  It appeared to our DBA that it would be easier to use
> citext data type then need to use ILIKE instead?
>

In the same vein...

Does postgres have case insensitive collations yet? Now that 9.1
supports column level collations that would be a really great option
for case insensitive queries.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Upgrade questions

2012-03-12 Thread Tim Uckun
> However, given the size of this table, I have no idea how long something
> like this might take.  In general I've had a tough time getting feedback
> from postgres on the progress of a query, how long something might take,
> etc.
>


You can always do this which would result in minimum hassles.

create a new bigint field.
copy all the IDs to it.
index it in the background
at frequency of your choosing sync the id field to the new field to keep it up.
at a time of your choosing set the default for the new field to be
serial starting at max(id)
drop the ID field
rename the field to id


That last bit could be done in a transaction and hopefully should not
take very long at a..

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread mgould
Tim,

It is my understanding that since the extention citext is available that
this gives you what your asking for and at least at this point isn't
going to be part of the core.

Michael Gould
Intermodal Software Solutions, LLC
904-226-0978
 
 
 Original Message 
Subject: Re: [GENERAL] full text search and ILIKE type clauses.
From: Tim Uckun 
Date: Mon, March 12, 2012 1:20 pm
To: mgo...@isstrucksoftware.net
Cc: Tom Lane , pgsql-general


>
> We made most of our text, varchar columns citext data types so that we
> could do case insensitive searches.  Is this going to negate most of the
> index searches?  It appeared to our DBA that it would be easier to use
> citext data type then need to use ILIKE instead?
>

In the same vein...

Does postgres have case insensitive collations yet? Now that 9.1
supports column level collations that would be a really great option
for case insensitive queries.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] GROUP BY or alternative means to group

2012-03-12 Thread Bartosz Dmytrak
Hi,
You can use one of windowing function:
http://www.postgresql.org/docs/9.1/static/tutorial-window.html
http://www.postgresql.org/docs/9.1/static/functions-window.html
this could be rank() in subquery or first_value(vale any), but there could
be performance issue

another solution could be boolean flag "default" in table address_reference
which should be unique for single company, I mean value true should be
unique - this could be reached by unique partial index on column*
*refid_companies
with condition default = true
http://www.postgresql.org/docs/9.1/static/indexes-partial.html#INDEXES-PARTIAL-EX3

hope Your pg version supports windowing functions (as I remember 8.4 and
above)

Of course there is a solution with subquery which finds min id in table
addresses of each refid_companies in table addresses_reference and this
subquery is joined with companies table, but I am afraid this is not the
best one.

Regards,
Bartek


2012/3/12 Alexander Reichstadt 

> Hi,
>
> the following statement worked on mysql but gives me an error on postgres:
>
> *column "addresses.address1" must appear in the GROUP BY clause or be
> used in an aggregate function*
>
> I guess I am doing something wrong. I read the web answers, but none of
> them seem to meet my needs:
>
> *SELECT 
> companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
> FROM companies JOIN addresses_reference ON 
> companies.id=addresses_reference.refid_companies
> LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.idGROUP 
> BY
> companies.id;*
>
>
> What I did now was create a view based on above statement but without
> grouping. This returns a list with non-distinct values for all companies
> that have more than one address, which is correct. But in some cases I only
> need one address and the problem is that I cannot use distinct.
>
> I wanted to have some way to display a companies list that only gives me
> the first stored addresses related, and disregard any further addresses.
>
> Is there any way to do this?
>
> Thanks
> Alex
>


Solved [Re: [GENERAL] GROUP BY or alternative means to group]

2012-03-12 Thread Alexander Reichstadt
So the mysql way for group by seems to be non-standard.

What works for postgres is the DISTINCT ON (fieldname) approach.

Thanks



Am 12.03.2012 um 20:35 schrieb Alexander Reichstadt:

> Hi,
> 
> the following statement worked on mysql but gives me an error on postgres:
> 
> column "addresses.address1" must appear in the GROUP BY clause or be used in 
> an aggregate function
> 
> I guess I am doing something wrong. I read the web answers, but none of them 
> seem to meet my needs:
> 
> SELECT 
> companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
>  FROM companies JOIN addresses_reference ON 
> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON 
> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;
> 
> 
> What I did now was create a view based on above statement but without 
> grouping. This returns a list with non-distinct values for all companies that 
> have more than one address, which is correct. But in some cases I only need 
> one address and the problem is that I cannot use distinct.
> 
> I wanted to have some way to display a companies list that only gives me the 
> first stored addresses related, and disregard any further addresses.
> 
> Is there any way to do this?
> 
> Thanks
> Alex



Re: [GENERAL] GROUP BY or alternative means to group

2012-03-12 Thread Kiriakos Georgiou
Instead of the joins you can use a subquery to get the first address.

Or you can do the joins without the group by and use row_number() 
over(partition by companies.id) on the select list to label each company 
address with a number starting at 1.  You can just keep rows that have 
row_number = 1.  See 
http://www.postgresql.org/docs/current/static/tutorial-window.html to get the 
feeling how window functions work.

Kiriakos

On Mar 12, 2012, at 3:35 PM, Alexander Reichstadt wrote:

> Hi,
> 
> the following statement worked on mysql but gives me an error on postgres:
> 
> column "addresses.address1" must appear in the GROUP BY clause or be used in 
> an aggregate function
> 
> I guess I am doing something wrong. I read the web answers, but none of them 
> seem to meet my needs:
> 
> SELECT 
> companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
>  FROM companies JOIN addresses_reference ON 
> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON 
> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;
> 
> 
> What I did now was create a view based on above statement but without 
> grouping. This returns a list with non-distinct values for all companies that 
> have more than one address, which is correct. But in some cases I only need 
> one address and the problem is that I cannot use distinct.
> 
> I wanted to have some way to display a companies list that only gives me the 
> first stored addresses related, and disregard any further addresses.
> 
> Is there any way to do this?
> 
> Thanks
> Alex



Re: [GENERAL] GROUP BY or alternative means to group

2012-03-12 Thread Alexander Reichstadt
Thanks, I just posted my response to my own question for the archives. I take 
it also that group by is faster than distinct on. If it is a substantial 
performance gain I have to work on this some more. A subquery I would expect 
would be much of a drag, so for all keystroke-updated list-tables this would 
not be suitable I think.



Am 12.03.2012 um 21:57 schrieb Bartosz Dmytrak:

> Hi,
> You can use one of windowing function:
> http://www.postgresql.org/docs/9.1/static/tutorial-window.html 
> http://www.postgresql.org/docs/9.1/static/functions-window.html
> this could be rank() in subquery or first_value(vale any), but there could be 
> performance issue
> 
> another solution could be boolean flag "default" in table address_reference 
> which should be unique for single company, I mean value true should be unique 
> - this could be reached by unique partial index on column refid_companies 
> with condition default = true
> http://www.postgresql.org/docs/9.1/static/indexes-partial.html#INDEXES-PARTIAL-EX3
> 
> hope Your pg version supports windowing functions (as I remember 8.4 and 
> above)
> 
> Of course there is a solution with subquery which finds min id in table 
> addresses of each refid_companies in table addresses_reference and this 
> subquery is joined with companies table, but I am afraid this is not the best 
> one.
> 
> Regards,
> Bartek
> 
> 
> 2012/3/12 Alexander Reichstadt 
> Hi,
> 
> the following statement worked on mysql but gives me an error on postgres:
> 
> column "addresses.address1" must appear in the GROUP BY clause or be used in 
> an aggregate function
> 
> I guess I am doing something wrong. I read the web answers, but none of them 
> seem to meet my needs:
> 
> SELECT 
> companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
>  FROM companies JOIN addresses_reference ON 
> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON 
> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;
> 
> 
> What I did now was create a view based on above statement but without 
> grouping. This returns a list with non-distinct values for all companies that 
> have more than one address, which is correct. But in some cases I only need 
> one address and the problem is that I cannot use distinct.
> 
> I wanted to have some way to display a companies list that only gives me the 
> first stored addresses related, and disregard any further addresses.
> 
> Is there any way to do this?
> 
> Thanks
> Alex
> 



Re: [GENERAL] GROUP BY or alternative means to group

2012-03-12 Thread Scott Marlowe
On Mon, Mar 12, 2012 at 1:35 PM, Alexander Reichstadt  wrote:
> Hi,
>
> the following statement worked on mysql but gives me an error on postgres:
>
> column "addresses.address1" must appear in the GROUP BY clause or be used in
> an aggregate function
>
> I guess I am doing something wrong. I read the web answers, but none of them
> seem to meet my needs:
>
> SELECT
> companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
> FROM companies JOIN addresses_reference ON
> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;
>
>
> What I did now was create a view based on above statement but without
> grouping. This returns a list with non-distinct values for all companies
> that have more than one address, which is correct. But in some cases I only
> need one address and the problem is that I cannot use distinct.
>
> I wanted to have some way to display a companies list that only gives me the
> first stored addresses related, and disregard any further addresses.
>
> Is there any way to do this?

If you don't care which address you get, you can use max(address) or
min(address).

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread Tim Uckun
> It is my understanding that since the extention citext is available that
> this gives you what your asking for and at least at this point isn't
> going to be part of the core.
>

For me it's more of a workaround than a solution but yes probably good
enough. Collation is more subtle than case insensitive comparisons but
for english anyway it should be good enough.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread Pavel Stehule
2012/3/12 Tim Uckun :
>> It is my understanding that since the extention citext is available that
>> this gives you what your asking for and at least at this point isn't
>> going to be part of the core.
>>
>
> For me it's more of a workaround than a solution but yes probably good
> enough. Collation is more subtle than case insensitive comparisons but
> for english anyway it should be good enough.

Postgres uses system locales - so theoretically you can write own case
insensitive locale.

Regards

Pavel Stehule

>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Upgrade questions

2012-03-12 Thread John R Pierce

On 03/12/12 1:25 PM, Tim Uckun wrote:

create a new bigint field.
copy all the IDs to it.
index it in the background
at frequency of your choosing sync the id field to the new field to keep it up.
at a time of your choosing set the default for the new field to be
serial starting at max(id)
drop the ID field
rename the field to id


if there's other tables that have FK references to this table's ID, that 
could be problematic.






--
john r pierceN 37, W 122
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] GROUP BY or alternative means to group

2012-03-12 Thread Alexander Reichstadt

I guess I lack the knowledge to integrate your answer in my queryActually 
I'd prefer to always see the first address entered unless there is a 
where-clause added. Not sure how this works out then and haven't tested. But 
given the initial query extended by distinct on it would be like so:

>> SELECT distinct on (companies.id)
>> companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
>> FROM companies JOIN addresses_reference ON
>> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
>> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id
where addresses.city ILIKE '%bla%'

I guess postgres would make sense and deliver the entry with bla with the 
where-clause, and disregard the bla entry returning random addresses associated 
with the company without the where-clause.

But where would I insert the max(address) piece?




Am 12.03.2012 um 22:09 schrieb Scott Marlowe:

> On Mon, Mar 12, 2012 at 1:35 PM, Alexander Reichstadt  wrote:
>> Hi,
>> 
>> the following statement worked on mysql but gives me an error on postgres:
>> 
>> column "addresses.address1" must appear in the GROUP BY clause or be used in
>> an aggregate function
>> 
>> I guess I am doing something wrong. I read the web answers, but none of them
>> seem to meet my needs:
>> 
>> SELECT
>> companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
>> FROM companies JOIN addresses_reference ON
>> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
>> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;
>> 
>> 
>> What I did now was create a view based on above statement but without
>> grouping. This returns a list with non-distinct values for all companies
>> that have more than one address, which is correct. But in some cases I only
>> need one address and the problem is that I cannot use distinct.
>> 
>> I wanted to have some way to display a companies list that only gives me the
>> first stored addresses related, and disregard any further addresses.
>> 
>> Is there any way to do this?
> 
> If you don't care which address you get, you can use max(address) or
> min(address).
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. How to compact it?

2012-03-12 Thread Nur Hidayat
If I didn't change the data type to charcater varying vaccuming the database 
doesn't reduce the database size



.::.
Sent from my BlackBerry®
powered by The ESQ Way 165

-Original Message-
From: Alban Hertroys 
Date: Mon, 12 Mar 2012 16:43:49 
To: 
Cc: John R Pierce; 
Subject: Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave.
 How to compact it?

On 12 March 2012 09:20, Nur Hidayat  wrote:
> FYI, after I changed text field into character varying, I vaccuum the whole 
> database, resulting in much smaller database size

What I think that happened in your case is that because of the
data-type change every row in the table got rewritten to a new version
where said column was of the new type. The subsequent vacuum then
removed the old (bloated) rows with the old type from the database
file.

And thus you ended up with a clean table.

> -Original Message-
> From: "Nur Hidayat" 
> Date: Mon, 12 Mar 2012 08:18:09
> To: John R Pierce; 
> Reply-To: hidayat...@gmail.com
> Subject: Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. 
> How to compact it?
>
> Yes, I am aware of that, but that's the fact I'm facing
> Right now I'am happy enough my system runs well without eating up my drive :)
> I'll investigate more later when time available :)
>
> Cheers,
> Nur Hidayat
>
>
>
> .::.
> Sent from my BlackBerry®
> powered by The ESQ Way 165
>
> -Original Message-
> From: John R Pierce 
> Sender: pgsql-general-owner@postgresql.orgDate: Mon, 12 Mar 2012 00:39:28
> To: 
> Subject: Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave.
>  How to compact it?
>
> On 03/12/12 12:06 AM, Nur Hidayat wrote:
>>
>> I once have the same problem. In my case it's because most of my table
>> using text datatype.
>> When I change the field type to character varying (1000) database size
>> reduced significantly
>>
>> Unfortunately, I haven't investigate more, but it looks like how
>> postgres stores data
>
> that doesn't make any sense.   text and character varying storage is
> exactly hte same, the only difference is the varchar has an optional
> length constraint
>
>
>
> --
> john r pierce                            N 37, W 122
> santa cruz ca                         mid-left coast
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] GROUP BY or alternative means to group

2012-03-12 Thread Scott Marlowe
On Mon, Mar 12, 2012 at 3:19 PM, Alexander Reichstadt  wrote:
> But where would I insert the max(address) piece?
>

Just put max() or min() around any field in the select list that's not
in the group by clause

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] A 154 GB table swelled to 527 GB on the Slony slave. How to compact it?

2012-03-12 Thread John R Pierce

On 03/12/12 2:28 PM, Nur Hidayat wrote:

If I didn't change the data type to charcater varying vaccuming the database 
doesn't reduce the database size


changing the data type required every tuple to get rewritten. a 
vacuum full, or a cluster likely would have done the same or better 
reduction in size..




--
john r pierceN 37, W 122
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Matching on keyword or phrases within a field that is delimited with an "or" operator "|"

2012-03-12 Thread Jim Ostler
I have a table that is around 20 GB, so I need to optimize as best as possible 
the matching with another table on keywords across multiple fields. I have 
around 10 fields that have keywords or phrases delimited with the "or" 
operator  "|". So it would be in the form of  "a | b  |  and jack  | cd" . 
There are around 20 keywords or phrases per field, and these keywords could be 
any word. 

Because of the size of the database suing a "like" match would take too long. I 
am not sure if tsvector would work, or if there is a way to indicate how you 
want it parsed? 

If I could index these fields somehow that would be best, but I don't want to 
do the traditional full text indexing as I only want to match whatever is 
between the " | " whether it is one word or more.

The original use of this was as it appears, to have the field "a |  b |  c" be 
read "a or b or c" etc. If there is a way to match using this type of logic 
with an index that would be great. 

I hope this is clear enough. Thanks for any help as I am fairly new at this so 
any direction would be helpful.

--Jim

Re: [GENERAL] Upgrade questions

2012-03-12 Thread Carson Gross
Tim,

Commando.  I like it.  Thanks a ton for that suggestion.  I'd still like to
hear if anyone has a good way to estimate the performance of these
operations, but I'll explore what it would mean to do exactly that.

John: thankfully this is a table without any fks in, although it is indexed
to hell.  I was concerned about the speed of updating the indexes, but with
Tim's suggestion we could recalculate everything in the background.

We are also considering sharding the table and maybe the right thing is to
simply fix it when we do the sharding.

Thanks for the tips guys,
Carson

P.S. A side question: this table of ours is under a fairly constant insert
load, and is read infrequently but rather violently (e.g. a group by over,
say 1-10k rows.)  Is that a bad access pattern?


On Mon, Mar 12, 2012 at 2:18 PM, John R Pierce  wrote:

> On 03/12/12 1:25 PM, Tim Uckun wrote:
>
>> create a new bigint field.
>> copy all the IDs to it.
>> index it in the background
>> at frequency of your choosing sync the id field to the new field to keep
>> it up.
>> at a time of your choosing set the default for the new field to be
>> serial starting at max(id)
>> drop the ID field
>> rename the field to id
>>
>
> if there's other tables that have FK references to this table's ID, that
> could be problematic.
>
>
>
>
>
> --
> john r pierceN 37, W 122
> santa cruz ca mid-left coast
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-general
>


Re: [GENERAL] Interesting article, Facebook woes using MySQL

2012-03-12 Thread Stefan Keller
Hi all

2011/7/12 Chris Travers :
> I am not convinced that VoltDB is a magic bullet either.  I don't

I have the chance to help preparing an interview with Mike
Stonebreaker to be published at www.odbms.org
I'd really like to know, if he is up-to-date how Postgres performs
these days and how he thinks how VoltDB overcame the overhead he
claims to exist in "old elephants".
Do you all have more questions to Mike?

--Stefan

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Matching on keyword or phrases within a field that is delimited with an "or" operator "|"

2012-03-12 Thread David Johnston

>> From: pgsql-general-ow...@postgresql.org
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Jim Ostler
>> Sent: Monday, March 12, 2012 6:57 PM
>> To: pgsql-general@postgresql.org
>> Subject: [GENERAL] Matching on keyword or phrases within a field that is
delimited with an "or" operator "|"
>>
>> I have a table that is around 20 GB, so I need to optimize as best as
possible the matching with another table on keywords across multiple fields.
I have around 10 fields that have keywords or phrases delimited with the
"or" 
>> operator  "|". So it would be in the form of  "a | b  |  and jack  | cd"
. There are around 20 keywords or phrases per field, and these keywords
could be any word. 
>>
>> Because of the size of the database suing a "like" match would take too
long. I am not sure if tsvector would work, or if there is a way to indicate
how you want it parsed? 
>>
>> If I could index these fields somehow that would be best, but I don't
want to do the traditional full text indexing as I only want
to match whatever is between the " | " whether it is one word or more.
>>
>> The original use of this was as it appears, to have the field "a  |  b 
|  c" be read "a or b or c" etc. If there is a way to match using this type
of logic with an index that would be great. 
>>
>> I hope this is clear enough. Thanks for any help as I am fairly new at
this so any direction would be helpful.
>>
>> --Jim

=

Start with this:

SELECT 'a' = ANY(regexp_split_to_array('a|b|c', '\|'));  -- In this query
the "ANY" is providing the OR capability; use "ALL" for AND

and adapt as needed.

Regular Expressions are friendly creatures - everybody should have at least
one.

Given the lack of an example, but functioning, query that currently does
what you want it is hard to provide suggestions on improvements.  Whether
the above even is useful for you I do not know due to the lack of details.

David J.






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Calculated update

2012-03-12 Thread Bret Stern

trying to update a varchar numeric string column
by converting it to int, adding a numeric value and insert it back
as a varchar

Having trouble with cast


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Calculated update

2012-03-12 Thread Rob Sargent

On 03/12/2012 06:28 PM, Bret Stern wrote:

trying to update a varchar numeric string column
by converting it to int, adding a numeric value and insert it back
as a varchar

Having trouble with cast


Possibly having trouble with two casts:  one from string to int, one 
from int to string?  You sql would help.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Matching on keyword or phrases within a field that is delimited with an "or" operator "|"

2012-03-12 Thread Martin Gregorie
On Mon, 2012-03-12 at 15:57 -0700, Jim Ostler wrote:
> I have a table that is around 20 GB, so I need to optimize as best as
> possible the matching with another table on keywords across multiple
> fields. I have around 10 fields that have keywords or phrases
> delimited with the "or" operator  "|". So it would be in the form of
>  "a | b  |  and jack  | cd" . There are around 20 keywords or phrases
> per field, and these keywords could be any word. 
> 
How static is the list of keywords?
Is there any significance in their order? (I'm guessing there isn't).
How many rows are there in the table, IOW how big is each row?

I wonder if putting the keywords in a separate table with a many:many
relationship with your big table would help. This would retain your
ability to add or remove keywords without affecting the schema. The
selection rules used to combine keywords would also run fast. 

I'm getting good performance from a mail archive where messages are
selected using:
- a 1:M relationship with a subject table
- a M:M relationship with an address table
- range comparison with the date sent (non-indexed field in the message)
- and ILIKE(%phrase%) comparison with the message text

The query is assembled using WHERE clauses for each of the four
selection possibilities listed above before being prepared and issued. A
where clause is only included if the user has specified a value for it.
There is an AND relationship between the clauses. Currently there are
around 130,000 messages in the database. Speed: it took 0.6 seconds to
find 209 messages to or from an address in the last 5 years and
containing the keyword 'f1a'. The same search without the date range
specified ran in 0.86 seconds. It took 16.9 seconds to find the 12331
messages containing 'f1a' when no address or data range were specified.
This is on a dual core, 3.2 GHz Athlon with 4GB of RAM using PG 9.1. 

The other possibility is to use a star schema with each keyword being a
dimension - IOW a traditional data warehouse set-up, but of course this
does require a static list of keywords to be defined.


Martin




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Calculated update

2012-03-12 Thread Bosco Rama
Bret Stern wrote:
> 
> trying to update a varchar numeric string column
> by converting it to int, adding a numeric value and insert it back
> as a varchar
> 
> Having trouble with cast

I assume you are doing an update as opposed to an insert.  You use
both above (and both numeric and int as well).  Anyway, try:

update tbl set col = (col::int + 1);

or some variation thereof.  That should do the trick.

Bosco.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Calculated update

2012-03-12 Thread Bret Stern
On Mon, 2012-03-12 at 17:39 -0700, Bosco Rama wrote:
> Bret Stern wrote:
> > 
> > trying to update a varchar numeric string column
> > by converting it to int, adding a numeric value and insert it back
> > as a varchar
> > 
> > Having trouble with cast
> 
> I assume you are doing an update as opposed to an insert.  You use
> both above (and both numeric and int as well).  Anyway, try:
> 
> update tbl set col = (col::int + 1);
> 
> or some variation thereof.  That should do the trick.
> 
> Bosco.

update open_orderheader
set order_id = (order_id::int + 3000)
where module_id='aggregate'

worked as advertised.
thanks for the help guys.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Upgrade questions

2012-03-12 Thread John R Pierce

On 03/12/12 5:01 PM, Carson Gross wrote:


We are also considering sharding the table and maybe the right thing 
is to simply fix it when we do the sharding.


postgres generally calls that partitioning...   Sharding usually means 
splitting data across multiple servers.




--
john r pierceN 37, W 122
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] COPY and indices?

2012-03-12 Thread François Beausoleil
Hi all,  

When using COPY FROM STDIN to stream thousands of rows (20k and more hourly), 
what happens with indices? Are they updated only once after the operation, or 
are they updated once per row? Note that I'm not replacing the table's data: 
I'm appending to what's already there. I suspect batching writes will be faster 
than writing each individual row using an INSERT statement.

Currently, I can sustain 30-40 writes per second on a Rackspace VPS. I know 
it's not the ideal solution, but that's what I'm working with. Following 
vmstat, the server is spending 30 to 40% of it's time in iowait. I don't have 
measurements as to what files are touched, and I'd welcome suggestions to 
measure the time PostgreSQL actually spends writing indices vs data.

Thanks!
François



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY and indices?

2012-03-12 Thread Ondrej Ivanič
Hi,

On 13 March 2012 15:11, François Beausoleil  wrote:
> When using COPY FROM STDIN to stream thousands of rows (20k and more hourly), 
> what happens with indices? Are they updated only once after the operation, or 
> are they updated once per row? Note that I'm not replacing the table's data: 
> I'm appending to what's already there. I suspect batching writes will be 
> faster than writing each individual row using an INSERT statement.

Yes, it will be faster to use COPY than plain INSERTs. We have similar
situation -- up to 10k rows every 3 minutes and around 15 indexes on
the table. Table is portioned and we do not update data. Check bloat
query reports some bloat but it growing very slowly and there is new
partition every month.

>
> Currently, I can sustain 30-40 writes per second on a Rackspace VPS. I know 
> it's not the ideal solution, but that's what I'm working with. Following 
> vmstat, the server is spending 30 to 40% of it's time in iowait. I don't have 
> measurements as to what files are touched, and I'd welcome suggestions to 
> measure the time PostgreSQL actually spends writing indices vs data.

Drop all indexes, measure time to insert and collect iostat output.
Create indexes, repeat the process and compare the results

-- 
Ondrej Ivanic
(ondrej.iva...@gmail.com)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY and indices?

2012-03-12 Thread Scott Marlowe
2012/3/12 François Beausoleil :
> Hi all,
>
> When using COPY FROM STDIN to stream thousands of rows (20k and more hourly), 
> what happens with indices? Are they updated only once after the operation, or 
> are they updated once per row? Note that I'm not replacing the table's data: 
> I'm appending to what's already there. I suspect batching writes will be 
> faster than writing each individual row using an INSERT statement.

Copy is faster, but if a single row fails (dup key etc) the whole copy fails.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY and indices?

2012-03-12 Thread hamann . w

Scott Marlowe wrote:
2012/3/12 François Beausoleil :
> Hi all,
>
> When using COPY FROM STDIN to stream thousands of rows (20k and more hourly), 
> what happens with indices? Are they updated only once after the operation, or 
> are they updated once per row? Note that I'm not replacing the table's data: 
> I'm appending to what's already there. I suspect batching writes will be 
> faster than writing each individual row using an INSERT statement.

Copy is faster, but if a single row fails (dup key etc) the whole copy fails.

Hi Scott,

my impression: it would be even faster to drop the indices, do the bulk copy,
and rebuild the indices after the fact.

Regards
Wolfgang Hamann

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general