[GENERAL] Ambiguous order by?

2013-05-22 Thread Cody Cutrer
I'm not sure if this is a bug, or something I'm not understanding. When I
have a column referenced in the select both fully qualified and not fully
qualified, an explicit inner join, and that column in the order by
(unqualified, and not in a function call), it complains that is ambiguous:

create table test1 (id integer, sortable_name varchar);
create table test2 (id integer, test1_id integer);
select test1.sortable_name, sortable_name from test1 inner join test2
on test1.id=test1_id order by sortable_name;


ERROR:  ORDER BY "sortable_name" is ambiguous
LINE 1: ...t1 inner join test2 on test1.id=test1_id order by sortable_n...

All of these work:

select test1.sortable_name, sortable_name from test1 order by
sortable_name /* no join */
select test1.sortable_name, sortable_name from test1, test2 where
test1.id=test1_id order by sortable_name; /* implicit join */
select test1.sortable_name from test1 inner join test2 on
test1.id=test1_id order by sortable_name /* only one sortable_name in
select */
select test1.sortable_name, sortable_name from test1 inner join test2
on test1.id=test1_id order by test1.sortable_name /* qualify
sortable_name in order, but not select */
select test1.sortable_name, test1.sortable_name from test1 inner join
test2 on test1.id=test1_id order by sortable_name /* qualify
sortable_name both selects */
select sortable_name, sortable_name from test1 inner join test2 on
test1.id=test1_id order by sortable_name /* unqualified everywhere */
select test1.sortable_name, sortable_name from test1 inner join test2
on test1.id=test1_id order by substring(sortable_name,1,2); /* use a
function in the order */


I've tried this on 9.1.4, 9.1.9, and 9.2.3.

The actual application usage looks more like SELECT users.*,
sortable_name FROM users 
ORDER BY sortable_name. The application code always appends
sortable_name to the select list because, depending on available
features, sortable_name might be a function call and in a GROUP BY.

Thanks for any insight,

Cody Cutrer


Re: [GENERAL] Ambiguous order by?

2013-05-22 Thread Cody Cutrer
Okay, so why does wrapping the order by in a function fix it? (or not doing
a join, or doing an implicit join)

Cody Cutrer


On Wed, May 22, 2013 at 11:36 AM, Tom Lane  wrote:

> Cody Cutrer  writes:
> > create table test1 (id integer, sortable_name varchar);
> > create table test2 (id integer, test1_id integer);
> > select test1.sortable_name, sortable_name from test1 inner join test2
> > on test1.id=test1_id order by sortable_name;
> > ERROR:  ORDER BY "sortable_name" is ambiguous
> > LINE 1: ...t1 inner join test2 on test1.id=test1_id order by
> sortable_n...
>
> I think it's unhappy because "sortable_name" could refer to either of
> the output columns (under the old SQL92 convention that an ORDER BY item
> is an output column name).  Probably the easiest way to dodge that is to
> qualify the name, ie ORDER BY test1.sortable_name.  A different line of
> attack is to use AS to relabel whichever output column you don't want to
> match.
>
> regards, tom lane
>


[GENERAL] Initing a new replica

2012-06-26 Thread Cody Cutrer
I've got a few questions about initing a new replica. We have a
modestly large DB cluster with a master and two replicas running with
streaming replication. We tend to switch which one is the master
fairly often, shuffling hardware, upgrading kernels, etc.  However,
every time we fail over, we have to re-init the old master as a new
replica from scratch using pg_basebackup. pg_basebackup's
documentation mentions copying the basebackup from one replica to
another, but doesn't really go into details. So I'm wondering if any
of the following would be valid ways to get the old master acting as a
replica against the new master more quickly:

 * Assuming the old master stops prior to the new master exiting
recovery, and there is no timeline divergence, simply copying the
.history file from pg_xlogs, creating a recovery.conf, and starting
postgres (this is similar to how we change the non-participating
replica to stream from the new master instead of the old master - copy
the .history file, alter recover.conf, and restart postgres)
 * Instead of using pg_basebackup, manually call pg_start_backup and
pg_stop_backup against the new master, and rsync the data over, since
presumably little has changed since a timeline divergence
 * Instead of using pg_basebackup, manually call pg_start_backup and
pg_stop_backup (against the new master? or the non-participating
slave? not necessary?), and rsync or raw copy the data over from the
non-participating slave, reducing load on the new master.

Thanks for any help,

Cody Cutrer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Initing a new replica

2012-06-26 Thread Cody Cutrer
After thinking some more, an expansion on the third possibility:

 * Call pg_start_backup on the master. Checkpoint the
non-participating replica. Copy data dirs from non-participating
replica. Copy xlogs from master. Call pg_stop_backup on the master.
Copy final xlogs from master. This would move the bulk of the I/O to
the non-participating replica, while still doing the "critical" parts
of the backup against the master.

Cody Cutrer

On Tue, Jun 26, 2012 at 10:04 PM, Cody Cutrer  wrote:
> I've got a few questions about initing a new replica. We have a
> modestly large DB cluster with a master and two replicas running with
> streaming replication. We tend to switch which one is the master
> fairly often, shuffling hardware, upgrading kernels, etc.  However,
> every time we fail over, we have to re-init the old master as a new
> replica from scratch using pg_basebackup. pg_basebackup's
> documentation mentions copying the basebackup from one replica to
> another, but doesn't really go into details. So I'm wondering if any
> of the following would be valid ways to get the old master acting as a
> replica against the new master more quickly:
>
>  * Assuming the old master stops prior to the new master exiting
> recovery, and there is no timeline divergence, simply copying the
> .history file from pg_xlogs, creating a recovery.conf, and starting
> postgres (this is similar to how we change the non-participating
> replica to stream from the new master instead of the old master - copy
> the .history file, alter recover.conf, and restart postgres)
>  * Instead of using pg_basebackup, manually call pg_start_backup and
> pg_stop_backup against the new master, and rsync the data over, since
> presumably little has changed since a timeline divergence
>  * Instead of using pg_basebackup, manually call pg_start_backup and
> pg_stop_backup (against the new master? or the non-participating
> slave? not necessary?), and rsync or raw copy the data over from the
> non-participating slave, reducing load on the new master.
>
> Thanks for any help,
>
> Cody Cutrer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Index on System Table

2012-03-20 Thread Cody Cutrer
I've got a SaaS situation where I'm using 1000+ schemas in a single
database (each schema contains the same tables, just different data
per tenant).  I used schemas so that the shared app servers could
share a connection to the single database for all schemas.  Things are
working fine. However, when using psql, doing \d or trying to use tab
complete takes FOREVER, because it's doing a sequence scan against
pg_class (which has over a million rows), and relying on
pg_table_is_visible to do search_path filtering. I've figured out that
if I add "nspname = ANY(current_schemas(true))" to the query psql is
using, and an index to pg_class on relnamespace, the query optimizer
is able to do an index scan, and the queries return in milliseconds
instead of minutes.  However, I can't actually create an index on
pg_class because it is a system table (I was able to test by copying
it to a temporary table and adding the index there). My question is if
there is a way to create the index on the system table somehow for
just my database, and if not how would the developer community react
to the suggestion of adding an index to a system table in the default
postgres distro.

Thanks,

Cody Cutrer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Indexes on System Table

2012-03-20 Thread Cody Cutrer
I've got a SaaS situation where I'm using 1000+ schemas in a single
database (each schema contains the same tables, just different data
per tenant).  I used schemas so that the shared app servers could
share a connection to the single database for all schemas.  Things are
working fine. However, when using psql, doing \d or trying to use tab
complete takes FOREVER, because it's doing a sequence scan against
pg_class (which has over a million rows), and relying on
pg_table_is_visible to do search_path filtering. I've figured out that
if I add "nspname = ANY(current_schemas(true))" to the query psql is
using, and an index to pg_class on relnamespace, the query optimizer
is able to do an index scan, and the queries return in milliseconds
instead of minutes.  However, I can't actually create an index on
pg_class because it is a system table (I was able to test by copying
it to a temporary table and adding the index there). My question is if
there is a way to create the index on the system table somehow for
just my database, and if not how would the developer community react
to the suggestion of adding an index to a system table in the default
postgres distro.

Thanks,

Cody Cutrer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index on System Table

2012-03-21 Thread Cody Cutrer
Thanks for the tips.  I spent some more time investigating.  It's
definitely pg_table_is_visible that's causing the problem.  A \dt
.* is fairly fast (like you said, it doesn't apply
pg_table_is_visible at all).  I tried adjusting the query in several
ways.  Adding either nspname=ANY(current_schemas(true)) or
relnamespace=ANY() didn't help,
because the query planner still applied pg_table_is_visible to every
row in pg_class.  Doing either of those and *removing* the
pg_table_is_visible query gave the best results.  That may be a good
solution, since for tab complete you don't really care which schema
and object is coming from, just that there is an object.  I'm not sure
about for /dt, though.

Anyhow, I've found a workaround with acceptable (still not "snappy")
performance for all the queries, though.  By running "alter function
pg_table_is_visible (rel oid) cost 50;" the query planner is now
avoiding that function, and doing other filtering first.  The queries
are all a few seconds now, but not multiple minutes.

Cody Cutrer

On Tue, Mar 20, 2012 at 6:06 PM, Tom Lane  wrote:
> Cody Cutrer  writes:
>> I've got a SaaS situation where I'm using 1000+ schemas in a single
>> database (each schema contains the same tables, just different data
>> per tenant). ...
>> if I add "nspname = ANY(current_schemas(true))" to the query psql is
>> using, and an index to pg_class on relnamespace, the query optimizer
>> is able to do an index scan, and the queries return in milliseconds
>> instead of minutes.  However, I can't actually create an index on
>> pg_class because it is a system table (I was able to test by copying
>> it to a temporary table and adding the index there). My question is if
>> there is a way to create the index on the system table somehow for
>> just my database,
>
> There's not really support for adding indexes to system catalogs
> on-the-fly.  I think it would work (barring concurrency issues)
> for most catalogs, but pg_class has special limitations due to
> the "relmapping" infrastructure.  It's not something I'd particularly
> care to try on a production database.
>
>> and if not how would the developer community react
>> to the suggestion of adding an index to a system table in the default
>> postgres distro.
>
> In many (probably most) databases, an index on pg_class.relnamespace
> wouldn't be selective enough to justify its update costs.  I'd want
> to see a lot more than one request for this before considering it.
>
> If you're correct that the main costs come from the pg_table_is_visible
> tests, it should be possible to dodge that without an extra index.
> I'd suggest making a function similar to current_schemas() except it
> returns an OID array instead of names (this should be cheaper anyway)
> and just putting the relnamespace = ANY(current_schema_oids()) condition
> in front of the visibility test.  Or maybe you could dispense with the
> visibility test altogether, depending on your usage patterns.
>
> (BTW, I think that "\d schemaname.*" doesn't involve any visibility
> tests, in case that helps.)
>
>                        regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index on System Table

2012-03-21 Thread Cody Cutrer
That's awesome, thanks! Yeah, I doubt I'll do that to our production
database, but maybe I'll try it on a copy sometime down the line.
Adjusting the cost for pg_table_is_visible is working well enough so
far.

Cody Cutrer

On Wed, Mar 21, 2012 at 12:17 PM, Tom Lane  wrote:
> Cody Cutrer  writes:
>> On Tue, Mar 20, 2012 at 6:06 PM, Tom Lane  wrote:
>>> There's not really support for adding indexes to system catalogs
>>> on-the-fly.  I think it would work (barring concurrency issues)
>>> for most catalogs, but pg_class has special limitations due to
>>> the "relmapping" infrastructure.  It's not something I'd particularly
>>> care to try on a production database.
>
> BTW, I experimented with that a little bit and found that the relmapper
> is not really the stumbling block, at least not after applying this
> one-line patch:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f70f095c9096d5e2689e8d79172b37b57a84e51b
>
> It seemed to work for me to do this:
>
>        stop postmaster
>        start a standalone backend with -c allow_system_table_mods=1
>        create index pg_class_relnamespace_index on pg_class(relnamespace);
>        stop standalone backend, restart postmaster
>
> There are a lot of gotchas here, notably that the session in which you
> create the index won't know it's there (so in this case, a reindex on
> pg_class would likely be advisable afterwards).  I still think you'd be
> nuts to try it on a production database, but ...
>
>                        regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general