Re: [GENERAL] where is the table?

2009-06-22 Thread Albe Laurenz
sergio nogueira wrote:
> dmonitor=> create table wereisthetable(col int);
> CREATE TABLE
> dmonitor=> select tablename, tablespace from pg_tables where 
> tablename='wereisthetable';
>tablename| tablespace 
> +
>  wereisthetable | 
> (1 row)
> 
> dmonitor=> alter table wereisthetable set tablespace monitor_dat1;
> ALTER TABLE
> dmonitor=> select tablename, tablespace from pg_tables where 
> tablename='wereisthetable';
>tablename| tablespace 
> +
>  wereisthetable | 
> (1 row)
> 
> dmonitor=> select version();
>   version 
>   
> 
>  PostgreSQL 8.4rc1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.3.2 
> 20081105 (Red Hat 4.3.2-7), 32-bit
> (1 row)

Strange, your sample works as expected on my
PostgreSQL 8.4rc1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 
20030502 (Red Hat Linux 3.2.3-59), 32-bit

What do you get for

select * from pg_tablespace;

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's wrong with this query?

2009-06-22 Thread Albe Laurenz
Thomas Kellerer wrote:
>> CREATE TABLE test
>> (
>>   value uuid
>> );
>> 
>> INSERT INTO test VALUES ('----');
>> INSERT INTO test VALUES ('----');
>> INSERT INTO test VALUES (null);
>> 
>> select * from test where value != '----';
>> 
>> What I expect to get is two rows: the 
>> '----' row and the null row, as both 
>> those values are in fact not '----'.  
>> However, I only get the first one.
>
> That is standard behaviour.
> A comparison with a NULL value always returns false (and that
> is not a Postgres speciality).

Sorry to be nitpicking, but maybe in that case it adds to clarity:

A comparison with NULL does not return FALSE, but "undefined" or NULL.

Try to run the following queries:

SELECT 1 = 2;
and
SELECT 1 = NULL;

and observe the different result.

In the context of the original question this difference does not matter,
because a comparison is considered successful only if it returns TRUE.

But I think this way it becomes clearer *why* neither = nor != will
succeed for a NULL (= undefined) value: if you don't know which value
a certain thing has, you can neither say that it is equal to 1 nor
that it is not equal to 1.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] temp tables, sessions, pgpool and disk

2009-06-22 Thread Ivan Sergio Borgonovo
On Mon, 22 Jun 2009 07:26:56 +0800
Craig Ringer  wrote:

> > http://www.postgresql.org/docs/8.3/interactive/sql-createtable.html
> > "If specified, the table is created as a temporary table.
> > Temporary tables are automatically dropped at the end of a
> > session"
> > 
> > I'd interpret it as a connection.
> 
> Correctly.
> 
> > I don't even know if it is possible to send more than one command
> > over a single connection and wait for the results asynchronously.
> > Any clarification?
> 
> To an extent cursors provide that ability. The "result" is returned
> quite promptly, but it's a placeholder that allows you to retrieve
> the real results progressively as you need them. Whether the
> database generates the results immediately and stores them to
> return later, or whether it generates them on demand, isn't
> something you can easily tell it's up to the database.

http://www.postgresql.org/docs/8.3/interactive/libpq-async.html
"PQsendQuery cannot be called again (on the same connection) until
PQgetResult has returned a null pointer, indicating that the command
is done."

Asynchronous calls can't be made to parallelize postgres queries on
the same "session", but just to parallelize client and server work.

So a temp table seems as private as I need it.

I wonder what will happen if I put something like pgpool between
postgresql and a web app.

> > So postgresql actually issues writes to disk and delegate to the
> > OS management of the cache/actual write on disk.

> Yes.

> > I thought it could just try to hold them in RAM and still
> > delegate to the OS to save them on disk in swap if the system is
> > short on RAM.

> For a variety of reasons, you REALLY don't want it to work that
> way.

mmm... first sorry for the noise...
Interpret the following as reality checks.
I'm perfectly aware building up a DB is not easy, and I'm not
pretending I know how to write one. ;)

> OS memory managers tend to be _much_ better and faster at managing
> pages that're backed by a file. They'll write dirty data out
> pre-emptively so that execution doesn't stall when memory runs
> low; they write data to the file in order for best disk
> performance; they efficiently buffer and read-ahead when pulling
> the data back in, etc.

> The OS knows much less about what anonymous memory (memory not
> backed by a file) "means" to a program and can't be as clever with
> it. Swapping tends to be _much_ more CPU expensive than writing

But issuing a write to disk Postgresql doesn't actually say anything
more about what it is placing on the disk and how it is going to
access it... and it is actually adding overhead to move it back and
forward, no matter if this overhead happens on RAM or disk.
Actually since temp table are private to the connection they should
(?) be private to a postgresql process, so the OS should be able to
do a good job.
I don't see any atomicity constraint, so... if something fail while
writing to RAM, as you said you shouldn't need a WAL.

> dirty buffers to a file. It's a lot more expensive to retrieve
> from disk, too, and usually involves lots of seeks for quite
> scattered pages instead of nice block readahead.

Once you're running out of memory I see no guaranty your file will
end up in a fast easily accessible area of your disk... and you're
going to add the overhead associated with a file system
(journalling, permissions/ownership, locks) swap is volatile... and
it should offers the guaranty you "need" for a temp table.

> The OS knows much better than PostgreSQL does when the table will
> fit in RAM and when it needs to spill to disk, and it's much
> better at managing that than Pg can ever be. It's great that Pg
> just uses the OS's hardware knowledge, system-wide awareness, and
> highly optimised memory manager + disk IO management to take care
> of the problem.

The same should be true for virtual memory, not just file management
and postgresql has a configuration file that should give a clue to
the DB about the expected workload and hardware.
Surely postgresql can't forecast how many and how large the temp
tables for a single connection will be... but substantially I got
the idea that a connection is somehow serial in its execution and
that storage could be garbage collected or just released early
(drop table, on commit drop).
This looks as it is taking temp tables very far from the standard.
And yeah... once you want to do memory management/resource management
inside SQL you've opened the doors of Hell.
But well 

For what I could see about SQL99 the definition of temp table is very
terse... and a bit confusing (at least for me) about global and
local.
I gave a quick look at what's available on MS SQL... and they have
an sort of "in memory temp table" but you can't modify its schema.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's wrong with this query?

2009-06-22 Thread Thomas Kellerer

Albe Laurenz, 22.06.2009 09:52:

Sorry to be nitpicking, but maybe in that case it adds to clarity:

A comparison with NULL does not return FALSE, but "undefined" or NULL.

Try to run the following queries:

SELECT 1 = 2;
and
SELECT 1 = NULL;

and observe the different result.

In the context of the original question this difference does not matter,
because a comparison is considered successful only if it returns TRUE.

But I think this way it becomes clearer *why* neither = nor != will
succeed for a NULL (= undefined) value: if you don't know which value
a certain thing has, you can neither say that it is equal to 1 nor
that it is not equal to 1.


Good points :)

Thanks for the clarification!

I recently saw a blog talking about interview questions. One of them was:


Under which circumstances does the following query *not* return all rows

SELECT * 
FROM the_table

WHERE some_column = some_column;

boils down to the same behaviour...

Regards
Thomas


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] temp tables, sessions, pgpool and disk

2009-06-22 Thread Martijn van Oosterhout
On Mon, Jun 22, 2009 at 09:53:59AM +0200, Ivan Sergio Borgonovo wrote:
> > The OS knows much less about what anonymous memory (memory not
> > backed by a file) "means" to a program and can't be as clever with
> > it. Swapping tends to be _much_ more CPU expensive than writing
> 
> But issuing a write to disk Postgresql doesn't actually say anything
> more about what it is placing on the disk and how it is going to
> access it... and it is actually adding overhead to move it back and
> forward, no matter if this overhead happens on RAM or disk.
> Actually since temp table are private to the connection they should
> (?) be private to a postgresql process, so the OS should be able to
> do a good job.
> I don't see any atomicity constraint, so... if something fail while
> writing to RAM, as you said you shouldn't need a WAL.

For the record, temp tables are in fact handled differently, in
particular they are not stored in the shared_buffers, but instead are
in backend local (private) buffers, whose size is controlled by
temp_buffers. They are indeed not WAL archived, nor written to disk
unless needed.

So yes, small temp tables will likely stay in memory, but large temp
tables may spill to disk. There's no flushing or syncing so quite
likely they'll end up in the OS disk cache for a while. Once the temp
table is deleted, the file is deleted and the OS throws that data away.
So temp tables most likely won't use any disk I/O, but they *can* if
the need arises.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [GENERAL] temp tables, sessions, pgpool and disk

2009-06-22 Thread Ivan Sergio Borgonovo
On Mon, 22 Jun 2009 11:40:08 +0200
Martijn van Oosterhout  wrote:

> On Mon, Jun 22, 2009 at 09:53:59AM +0200, Ivan Sergio Borgonovo
> wrote:
> > > The OS knows much less about what anonymous memory (memory not
> > > backed by a file) "means" to a program and can't be as clever
> > > with it. Swapping tends to be _much_ more CPU expensive than
> > > writing
> > 
> > But issuing a write to disk Postgresql doesn't actually say
> > anything more about what it is placing on the disk and how it is
> > going to access it... and it is actually adding overhead to move
> > it back and forward, no matter if this overhead happens on RAM
> > or disk. Actually since temp table are private to the connection
> > they should (?) be private to a postgresql process, so the OS
> > should be able to do a good job.
> > I don't see any atomicity constraint, so... if something fail
> > while writing to RAM, as you said you shouldn't need a WAL.
> 
> For the record, temp tables are in fact handled differently, in
> particular they are not stored in the shared_buffers, but instead
> are in backend local (private) buffers, whose size is controlled by
> temp_buffers. They are indeed not WAL archived, nor written to disk
> unless needed.

> So yes, small temp tables will likely stay in memory, but large
> temp tables may spill to disk. There's no flushing or syncing so
> quite likely they'll end up in the OS disk cache for a while. Once
> the temp table is deleted, the file is deleted and the OS throws
> that data away. So temp tables most likely won't use any disk I/O,
> but they *can* if the need arises.

Just to make it extra-clear to people unaware of pg internals...
since the second paragraph may seems to contradict the first one...

could be "nor written to disk unless needed" rephrased as:
even repeated UPDATE/INSERT won't issue writes (no matter if they end
up on disk or not, it won't issue writes to the OS) if the table fit
the buffer?

I see the default is somehow "large" (8M) and it is not pre
allocated. Looks nice.

> Have a nice day,

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Graphical representation of query plans

2009-06-22 Thread Viktor Rosenfeld

Hi everybody,

is there a (stand-alone, command line) tool that converts the output  
of EXPLAIN ANALYZE into a tree-like representation of the plan?


Cheers,
Viktor

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Graphical representation of query plans

2009-06-22 Thread Grzegorz Jaśkiewicz
pgadmin does it pretty nicely:
http://pgadmin.org/images/screenshots/pgadmin3_macosx.png

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] running pg_dump from python

2009-06-22 Thread Dimitri Fontaine

Hi,

Erik Jones  writes:

> On Jun 15, 2009, at 5:17 AM, Jasen Betts wrote:
>
>> On 2009-06-14, Garry Saddington  wrote:
>>> def backup():
>>>import  os
>>>os.popen("c:/scholarpack/postgres/bin/pg_dump scholarpack  >
>>> c:/scholarpack/ancillary/scholarpack.sql")
>>
>> are you sure you're using os.popen correctly?
>> you don't appear to be waiting for the pg_dump process to finish.
>
> Right, the popen stuff should be something like:
>
> p = os.popen("c:/scholarpack/postgres/bin/pg_dump scholarpack  > c:/
> scholarpack/ancillary/scholarpack.sql 2> c:/scholarpack/ancillary/
> dump.err")
> status = p.close()
>
> Then check status to see if the command was successful or not.

Well, use subprocess:

def run_command(command, expected_retcodes = 0, stdin = None):
"""run a command and raise an exception if retcode not in 
expected_retcode"""

# we want expected_retcode to be a tuple but will manage integers
if type(expected_retcodes) == type(0):
expected_retcodes = (expected_retcodes,)

# we want the command to be a list, but accomodate when given a string
cmd = command
if type(cmd) == type('string'):
cmd = shlex.split(command)

proc = subprocess.Popen(cmd,
stdin  = stdin,
stdout = subprocess.PIPE,
stderr = subprocess.PIPE)

out, err = proc.communicate()

if proc.returncode not in expected_retcodes:
# when nothing gets to stderr, add stdout to Detail
if err.strip() == '':
err = out

mesg  = 'Error [%d]: %s' % (proc.returncode, command)
mesg += '\nDetail: %s' % err
raise Exception, mesg

return proc.returncode, out, err


Regards,
-- 
dim

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Information about columns

2009-06-22 Thread Dario Teixeira

Hi,

Is there some way to find out meta-information about the columns generated
by any SELECT query?  If the SELECT returns values from a regular table or
a view, I can use pg_class and pg_attribute to get the info I need:

CREATE TABLE foobar (quant int);
SELECT quant FROM foobar;
SELECT attname, attnum FROM pg_attribute, pg_class WHERE attrelid=pg_class.oid 
AND relname='foobar';

But what if the columns belong to a "virtual" table instead, as per the
(silly) example below?

SELECT 1, 2*quant FROM foobar;

Thanks in advance for any help you might give me!
Best regards,
Dario Teixeira





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Information about columns

2009-06-22 Thread Craig Ringer
On Mon, 2009-06-22 at 05:26 -0700, Dario Teixeira wrote:

> Is there some way to find out meta-information about the columns generated
> by any SELECT query?

How are you talking to the database ? ODBC? JDBC? LibPQ? Something else?
Or do you want this from within PL/PgSQL ?

You'll usually find that this information is most easily obtained via
your client driver. I don't know if it's exposed at the SQL level, but
it's certainly available from the underlying PostgreSQL network
protocol.

--
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Information about columns

2009-06-22 Thread Pavel Stehule
2009/6/22 Dario Teixeira :
>
> Hi,
>
> Is there some way to find out meta-information about the columns generated
> by any SELECT query?  If the SELECT returns values from a regular table or
> a view, I can use pg_class and pg_attribute to get the info I need:
>
> CREATE TABLE foobar (quant int);
> SELECT quant FROM foobar;
> SELECT attname, attnum FROM pg_attribute, pg_class WHERE 
> attrelid=pg_class.oid AND relname='foobar';
>
> But what if the columns belong to a "virtual" table instead, as per the
> (silly) example below?
>
> SELECT 1, 2*quant FROM foobar;
>


you can do it on low level - look on functions

PQdescribePrepared
PQdescribePortal

regards
Pavel Stehule

> Thanks in advance for any help you might give me!
> Best regards,
> Dario Teixeira
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Killing a data modifying transaction

2009-06-22 Thread William Temperley
Hi All,

I've got two transactions I tried to kill 3 days ago using "select
pg_cancel_backend()", then SIGTERM, and have since then been
using 100% of a cpu core each. They were supposed to insert the
results of large unions with PostGIS and appear to have failed.
Could someone tell me what's the least worst option here please? If I
kill -9 will I corrupt my data directory?  (I've searched for an
answer in the FAQ and the wiki, to no avail). I'm on Ubuntu server
8.04.1 and Postgres 8.3.7.

Thanks and regards,

Will Temperley

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] 8.2 instance that won't come up after shutdown

2009-06-22 Thread Abraham, Danny
Hi 

We have a PG version 8.2  instance that won't come up after shutdown. The log 
shows the following:

2009-06-12 21:39:02.669 MDTLOG:  database system was shut down at 2009-06-12 
20:56:51 MDT
2009-06-12 21:39:02.748 MDTLOG:  checkpoint record is at 9/D712F330
2009-06-12 21:39:02.748 MDTLOG:  redo record is at 9/D712F330; undo record is 
at 0/0; shutdown TRUE
2009-06-12 21:39:02.748 MDTLOG:  next transaction ID: 0/8239765; next OID: 25433
2009-06-12 21:39:02.748 MDTLOG:  next MultiXactId: 1; next MultiXactOffset: 0
2009-06-12 21:39:02.838 MDTLOG:  database system is ready
2009-06-12 21:39:02.846 MDTFATAL:  invalid memory alloc request size 8455717278
2009-06-12 21:39:02.862 MDTLOG:  startup process (PID 6198) exited with exit 
code 1
2009-06-12 21:39:02.864 MDTLOG:  aborting startup due to startup process failure
2009-06-12 21:39:02.928 MDTLOG:  logger shutting down

We tried to set the autovacuum parameter value to off but unfortunately it did 
not help

1. Is this bug defined for the 8.2.4 code line also (We know that it is defined 
for 8.1.X) 2. How can we recreate the problem?
3. Is there a fix for that bug?

Any help will be appreciated.

Best regards,
Danny Abraham


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Killing a data modifying transaction

2009-06-22 Thread Tom Lane
William Temperley  writes:
> I've got two transactions I tried to kill 3 days ago using "select
> pg_cancel_backend()", then SIGTERM, and have since then been
> using 100% of a cpu core each. They were supposed to insert the
> results of large unions with PostGIS and appear to have failed.
> Could someone tell me what's the least worst option here please? If I
> kill -9 will I corrupt my data directory?

No, you'll just lose all your open sessions.

It might be worth trying to identify where they're looping before
you zap them, though.  A stack trace from gdb would help.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Select ranges based on sequential breaks

2009-06-22 Thread Joel Nothman
Hi Mike,

I happened upon your query, which is related to some stuff I've been
playing with.

Firstly, David's solution below doesn't work. I haven't yet tried to work
out why.

Secondly, I was hoping to be able to solve your problem nicely with
Postgres 8.4's window functions [1,2], which can provide access to
data in sequentially-related rows.

Given the following setup:
CREATE TABLE foo (dt date, bin varchar(4));
INSERT INTO foo VALUES ('2009-01-01', 'red'), ('2009-01-02', 'red'),
('2009-01-03', 'blue'), ('2009-01-04', 'blue'), ('2009-01-05', 'blue'),
('2009-01-06', 'red'), ('2009-01-07', 'blue'), ('2009-01-08', 'blue'),
('2009-01-09', 'red'), ('2009-01-10', 'red');

I had hoped the following would suffice:
SELECT first_value(dt) OVER w, last_value(dt) OVER w, bin FROM foo WINDOW
w AS (ORDER BY dt PARTITION BY bin);

Apparently, this is bad syntax. ORDER BY cannot precede PARTITION BY in a
WINDOW declaration, and yet I wanted a grouping of date-consecutive bins,
which (PARTITION BY bin ORDER BY dt) would not give me.

I was able to produce the required result with:

SELECT MIN(dt) AS first, MAX(dt) AS last, MAX(bin) AS bin
  FROM (SELECT dt, bin,
   CASE WHEN newbin IS NULL
   THEN 0
   ELSE SUM(newbin) OVER (ORDER BY dt)
   END AS binno
  FROM (SELECT *,
   (bin !=3D lag(bin, 1) OVER (ORDER BY dt))::int A=
S
newbin
  FROM foo
   ) AS newbins
  ) AS binnos
GROUP BY binno
ORDER BY first;

This relies on a middle step in which I create an enumeration of the bins
in sequence:
SELECT dt, bin, CASE WHEN newbin IS NULL THEN 0 ELSE sum(newbin) OVER
(ORDER BY dt) END AS binno FROM (SELECT *, (bin !=3D lag(bin, 1) OVER (ORDE=
R
BY dt))::int AS newbin FROM foo) AS newbins;
 dt | bin  | binno
+--+---
 2009-01-01 | red  | 0
 2009-01-02 | red  | 0
 2009-01-03 | blue | 1
 2009-01-04 | blue | 1
 2009-01-05 | blue | 1
 2009-01-06 | red  | 2
 2009-01-07 | blue | 3
 2009-01-08 | blue | 3
 2009-01-09 | red  | 4
 2009-01-10 | red  | 4

I would hope there is a neater way to do this with window functions.

The best way to solve your problem may be with PL/SQL, which is also good
at dealing with sequences (it's not as complicated as it looks!):

CREATE TYPE bindates AS (first date, last date, bin varchar(5));
CREATE OR REPLACE FUNCTION bindates()
RETURNS SETOF bindates
AS $$
DECLARE
  row record;
  res bindates;
BEGIN
  FOR row IN SELECT * FROM foo ORDER BY dt
  LOOP
IF res.bin IS NULL OR res.bin !=3D row.bin THEN
  IF res.bin IS NOT NULL THEN
RETURN NEXT res;
  END IF;
  res.bin :=3D row.bin;
  res.first :=3D row.dt;
END IF;
res.last :=3D row.dt;
  END LOOP;
  IF res.bin IS NOT NULL THEN
RETURN NEXT res;
  END IF;
END;
$$ LANGUAGE plpgsql;


Finally, I'll try to sort out David's solution. Once we correct his typo
(t1.order -> t1.date) and add an 'ORDER BY first' to the end, we get:
   first|last| bin
++--
 2009-01-03 | 2009-01-05 | blue
 2009-01-06 | 2009-01-06 | red
 2009-01-07 | 2009-01-08 | blue
 2009-01-09 || red

This includes correct output, but it fails on both edge cases. The
non-appearance of the first row is due to the WHERE clause on the main
SELECT statement:
WHERE (SELECT bin
   FROM foo t2
   WHERE t2.dt < t1.dt
   ORDER BY dt DESC LIMIT 1) <> t1.bin

If we drop this WHERE clause, we get:
   first|last| bin
++--
 2009-01-01 | 2009-01-02 | red
 2009-01-02 | 2009-01-02 | red
 2009-01-03 | 2009-01-05 | blue
 2009-01-04 | 2009-01-05 | blue
 2009-01-05 | 2009-01-05 | blue
 2009-01-06 | 2009-01-06 | red
 2009-01-07 | 2009-01-08 | blue
 2009-01-08 | 2009-01-08 | blue
 2009-01-09 || red
 2009-01-10 || red

We can therefore get the result including the first row by selecting from
this table with 'GROUP BY last, bin'.

And we can hack in a value for those final NULLs as a special case. The
following statement works:

SELECT MIN(first),
   CASE WHEN last IS NULL THEN (SELECT MAX(dt) FROM foo) ELSE last
END,
   bin
FROM (
  SELECT dt AS first,
 (SELECT dt
  FROM foo t3
  WHERE t3.dt < (
  SELECT dt
  FROM foo t5
  WHERE t5.dt > t1.dt
AND t5.bin <> t1.bin
  ORDER BY dt ASC
  LIMIT 1)
  ORDER BY dt DESC
  LIMIT 1) AS last,
  bin
  FROM foo t1
) t0
GROUP BY last, bin
ORDER BY last;


Finally, what's efficient? With 1,000,000 random rows, we get:
Enumeration: 13s
PL/SQL

Re: [GENERAL] Killing a data modifying transaction

2009-06-22 Thread William Temperley
2009/6/22 Tom Lane :
> William Temperley  writes:
>> I've got two transactions I tried to kill 3 days ago using "select
>> pg_cancel_backend()", then SIGTERM, and have since then been
>> using 100% of a cpu core each. They were supposed to insert the
>> results of large unions with PostGIS and appear to have failed.
>> Could someone tell me what's the least worst option here please? If I
>> kill -9 will I corrupt my data directory?
>
> No, you'll just lose all your open sessions.
>
> It might be worth trying to identify where they're looping before
> you zap them, though.  A stack trace from gdb would help.
>
>                        regards, tom lane
>

Thanks Tom.

I'm wondering if I happened as I'd started the same query twice.
The first had work_mem = 1MB so I tried to kill it and started another
with work_mem = 1000MB, but both were attempting to insert the same id
into a PK:
"insert into world (geom, id) select st_union(geom), 1 from adminunit
where admin_level = '0'".
Just now when I killed the first process, the other terminated.

I'll run the query again and see if it wasn't just my impatience that
caused it - and post the stack trace if it fails.

Thanks,

Will Temperley.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

2009-06-22 Thread Todd A. Cook

Todd A. Cook wrote:

Tom Lane wrote:

"Todd A. Cook"  writes:

First, the numbers:
  PG VersionLoad time pg_database_size autovac
--
   8.2.13179 min   92,807,992,820on
   8.3.7 180 min   84,048,744,044on (defaults)
   8.4b2 206 min   84,028,995,344on (defaults)
   8.4b2 183 min   84,028,839,696off
The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int).  Each table has a
single partial index on one of the integer columns.

Given that it's multiple tables, it'd be possible for autovacuum to
kick in and ANALYZE the data inserted into earlier tables while the
later ones were still being loaded.  If so, the discrepancy might be
explained by 8.4's more-aggressive statistics target, which means that
a background ANALYZE will take about 10x more work than before.

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.


That would seem to be it:

  8.4b2  183 min  84,028,897,040 on (defaults, 
default_statistics_target=10)

I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend.


The results for this are also consistent with Tom's theory:

   8.3.7   205 min  84,048,866,924 on (defaults, 
default_statistics_target=100)

-- todd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Killing a data modifying transaction

2009-06-22 Thread Tom Lane
William Temperley  writes:
> I'm wondering if I happened as I'd started the same query twice.
> The first had work_mem = 1MB so I tried to kill it and started another
> with work_mem = 1000MB, but both were attempting to insert the same id
> into a PK:
> "insert into world (geom, id) select st_union(geom), 1 from adminunit
> where admin_level = '0'".
> Just now when I killed the first process, the other terminated.

Well, that last is expected --- as soon as you kill -9 one backend, the
postmaster is going to force-quit all the others and perform a database
restart.  So we don't really know anything more than before.

Given that they'd both be trying to insert the same PK values, it'd be
unsurprising for one of the processes to be blocked waiting to see if
the other one commits.  But didn't you say they were both eating CPU?

I'm personally inclined to think some PostGIS oddity here (which means
you might get more help asking about it on the postgis lists).  But
that's mere speculation.  A stack trace showing where it was looping
would've provided something more to go on ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Build in spatial support vs. PostGIS

2009-06-22 Thread Johan Nel

Hi Assaf
Assaf Lavie wrote:

Can anyone please shed light on the difference between the two:
http://stackoverflow.com/questions/1023229/spatial-data-in-postgresql
(login _not_ required)



In general, if you store spatial data typically found in a CAD 
environment, the build in spatial features are more than enough. 
However the moment you start having to reference the spatial data to a 
position on earth, you really cannot use anything else than PostGIS.


So in general, of you have spatial features all starting with a 
reference point of (0,0) use the build-in support.


HTH,

Johan Nel
Pretoria, South Africa.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Select ranges based on sequential breaks

2009-06-22 Thread Joel Nothman
Hi Mike,

I happened upon your query, which is related to some stuff I've been
playing with.

Firstly, David's solution below doesn't work. I haven't yet tried to work
out why.

Secondly, I was hoping to be able to solve your problem nicely with
Postgres 8.4's window functions [1,2], which provide functions relating.

Given the following setup:
CREATE TABLE foo (dt date, bin varchar(4));
INSERT INTO foo VALUES ('2009-01-01', 'red'), ('2009-01-02', 'red'),
('2009-01-03', 'blue'), ('2009-01-04', 'blue'), ('2009-01-05', 'blue'),
('2009-01-06', 'red'), ('2009-01-07', 'blue'), ('2009-01-08', 'blue'),
('2009-01-09', 'red'), ('2009-01-10', 'red');

I had hoped the following would suffice:
SELECT first_value(dt) OVER w, last_value(dt) OVER w, bin FROM foo WINDOW
w AS (ORDER BY dt PARTITION BY bin);

Apparently, this is bad syntax. ORDER BY cannot precede PARTITION BY in a
WINDOW declaration, and yet I wanted a grouping of date-consecutive bins,
which (PARTITION BY bin ORDER BY dt) would not give me.

I was able to produce the required result with:

SELECT MIN(dt) AS first, MAX(dt) AS last, MAX(bin) AS bin
  FROM (SELECT dt, bin,
   CASE WHEN newbin IS NULL
   THEN 0
   ELSE SUM(newbin) OVER (ORDER BY dt)
   END AS binno
  FROM (SELECT *,
   (bin != lag(bin, 1) OVER (ORDER BY dt))::int AS
newbin
  FROM foo
   ) AS newbins
  ) AS binnos
GROUP BY binno
ORDER BY first;

This relies on a middle step in which I create an enumeration of the bins
in sequence:
SELECT dt, bin, CASE WHEN newbin IS NULL THEN 0 ELSE sum(newbin) OVER
(ORDER BY dt) END AS binno FROM (SELECT *, (bin != lag(bin, 1) OVER (ORDER
BY dt))::int AS newbin FROM foo) AS newbins;
 dt | bin  | binno
+--+---
 2009-01-01 | red  | 0
 2009-01-02 | red  | 0
 2009-01-03 | blue | 1
 2009-01-04 | blue | 1
 2009-01-05 | blue | 1
 2009-01-06 | red  | 2
 2009-01-07 | blue | 3
 2009-01-08 | blue | 3
 2009-01-09 | red  | 4
 2009-01-10 | red  | 4

I would hope there is a neater way to do this with window functions.

The best way to solve your problem may be with PL/SQL, which is also good
at dealing with sequences (it's not as complicated as it looks!):

CREATE TYPE bindates AS (first date, last date, bin varchar(5));
CREATE OR REPLACE FUNCTION bindates()
RETURNS SETOF bindates
AS $$
DECLARE
  row record;
  res bindates;
BEGIN
  FOR row IN SELECT * FROM foo ORDER BY dt
  LOOP
IF res.bin IS NULL OR res.bin != row.bin THEN
  IF res.bin IS NOT NULL THEN
RETURN NEXT res;
  END IF;
  res.bin := row.bin;
  res.first := row.dt;
END IF;
res.last := row.dt;
  END LOOP;
  IF res.bin IS NOT NULL THEN
RETURN NEXT res;
  END IF;
END;
$$ LANGUAGE plpgsql;


Finally, I'll try to sort out David's solution. Once we correct his typo
(t1.order -> t1.date) and add an 'ORDER BY first' to the end, we get:
   first|last| bin
++--
 2009-01-03 | 2009-01-05 | blue
 2009-01-06 | 2009-01-06 | red
 2009-01-07 | 2009-01-08 | blue
 2009-01-09 || red

This includes correct output, but it fails on both edge cases. The
non-appearance of the first row is due to the WHERE clause on the main
SELECT statement:
WHERE (SELECT bin
   FROM foo t2
   WHERE t2.dt < t1.dt
   ORDER BY dt DESC LIMIT 1) <> t1.bin

If we drop this WHERE clause, we get:
   first|last| bin
++--
 2009-01-01 | 2009-01-02 | red
 2009-01-02 | 2009-01-02 | red
 2009-01-03 | 2009-01-05 | blue
 2009-01-04 | 2009-01-05 | blue
 2009-01-05 | 2009-01-05 | blue
 2009-01-06 | 2009-01-06 | red
 2009-01-07 | 2009-01-08 | blue
 2009-01-08 | 2009-01-08 | blue
 2009-01-09 || red
 2009-01-10 || red

We can therefore get the result including the first row by selecting from
this table with 'GROUP BY last, bin'.

And we can hack in a value for those final NULLs as a special case. The
following statement works:

SELECT MIN(first),
   CASE WHEN last IS NULL THEN (SELECT MAX(dt) FROM foo) ELSE last
END,
   bin
FROM (
  SELECT dt AS first,
 (SELECT dt
  FROM foo t3
  WHERE t3.dt < (
  SELECT dt
  FROM foo t5
  WHERE t5.dt > t1.dt
AND t5.bin <> t1.bin
  ORDER BY dt ASC
  LIMIT 1)
  ORDER BY dt DESC
  LIMIT 1) AS last,
  bin
  FROM foo t1
) t0
GROUP BY last, bin
ORDER BY last;


Finally, what's efficient? With 1,000,000 random rows, we get:
Enumeration: 13s
PL/SQL: 12s
Modified David: minutes.

[I used the f

Re: [GENERAL] pl/sql resources for pl/pgsql?

2009-06-22 Thread Merlin Moncure
On Sat, Jun 20, 2009 at 4:01 PM, Pavel Stehule wrote:
>> *) misc:
>>  *) never declare a function to return void
>
> ??? why - when we have not procedures ?

The main reason is that functions returning void can not be used with
binary protocol.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Information about columns

2009-06-22 Thread Dario Teixeira

Hi,

> How are you talking to the database ? ODBC? JDBC? LibPQ? Something else?
> Or do you want this from within PL/PgSQL ?

I'm hacking on a client-side library which talks directly to the Postmaster
using the wire protocol [1]. I need this information to improve some of the
nullability-detection heuristics used by the library.

Cheers,
Dario

[1] http://pgocaml.berlios.de/





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 8.2 instance that won't come up after shutdown

2009-06-22 Thread Scott Marlowe
On Mon, Jun 22, 2009 at 7:17 AM, Abraham, Danny wrote:
> Hi
>
> We have a PG version 8.2  instance that won't come up after shutdown. The log 
> shows the following:
>
> 2009-06-12 21:39:02.669 MDTLOG:  database system was shut down at 2009-06-12 
> 20:56:51 MDT
> 2009-06-12 21:39:02.748 MDTLOG:  checkpoint record is at 9/D712F330
> 2009-06-12 21:39:02.748 MDTLOG:  redo record is at 9/D712F330; undo record is 
> at 0/0; shutdown TRUE
> 2009-06-12 21:39:02.748 MDTLOG:  next transaction ID: 0/8239765; next OID: 
> 25433
> 2009-06-12 21:39:02.748 MDTLOG:  next MultiXactId: 1; next MultiXactOffset: 0
> 2009-06-12 21:39:02.838 MDTLOG:  database system is ready
> 2009-06-12 21:39:02.846 MDTFATAL:  invalid memory alloc request size 
> 8455717278
> 2009-06-12 21:39:02.862 MDTLOG:  startup process (PID 6198) exited with exit 
> code 1
> 2009-06-12 21:39:02.864 MDTLOG:  aborting startup due to startup process 
> failure
> 2009-06-12 21:39:02.928 MDTLOG:  logger shutting down
>
> We tried to set the autovacuum parameter value to off but unfortunately it 
> did not help
>
> 1. Is this bug defined for the 8.2.4 code line also (We know that it is 
> defined for 8.1.X) 2. How can we recreate the problem?
> 3. Is there a fix for that bug?
>
> Any help will be appreciated.

I'd update to 8.2.latest and see if it then comes up.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Select ranges based on sequential breaks

2009-06-22 Thread Scott Marlowe
On Mon, Jun 15, 2009 at 12:23 PM, Mike Toews wrote:
> This is easy to compute using a spreadsheet or in R, but how would I do this
> with SQL? I'm using 8.3. Advice is appreciated.

FYI (and I'm no expert in this area) R is available as a pl for
postgres, look for pl/R or plR

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Killing a data modifying transaction

2009-06-22 Thread William Temperley
2009/6/22 Tom Lane :
> William Temperley  writes:
>> I'm wondering if I happened as I'd started the same query twice.
>> The first had work_mem = 1MB so I tried to kill it and started another
>> with work_mem = 1000MB, but both were attempting to insert the same id
>> into a PK:
>> "insert into world (geom, id) select st_union(geom), 1 from adminunit
>> where admin_level = '0'".
>> Just now when I killed the first process, the other terminated.
>
> Well, that last is expected --- as soon as you kill -9 one backend, the
> postmaster is going to force-quit all the others and perform a database
> restart.  So we don't really know anything more than before.
>
> Given that they'd both be trying to insert the same PK values, it'd be
> unsurprising for one of the processes to be blocked waiting to see if
> the other one commits.  But didn't you say they were both eating CPU?
>
> I'm personally inclined to think some PostGIS oddity here (which means
> you might get more help asking about it on the postgis lists).  But
> that's mere speculation.  A stack trace showing where it was looping
> would've provided something more to go on ...
>

Yes they were both eating CPU. I've tried the query again and it seems
to be stuck again - the trace has been the same for an hour or so now.
I guess I'd best post to the PostGIS list.

#0  0x7fae68ec6a75 in
geos::DefaultCoordinateSequence::~DefaultCoordinateSequence () from
/usr/lib/libgeos.so.2
#1  0x7fae68ed433e in geos::LineString::~LineString () from
/usr/lib/libgeos.so.2
#2  0x7fae68ed23c7 in geos::LinearRing::~LinearRing () from
/usr/lib/libgeos.so.2
#3  0x7fae68f20e68 in geos::PolygonBuilder::findEdgeRingContaining
() from /usr/lib/libgeos.so.2
#4  0x7fae68f21740 in geos::PolygonBuilder::placeFreeHoles () from
/usr/lib/libgeos.so.2
#5  0x7fae68f218d6 in geos::PolygonBuilder::add () from
/usr/lib/libgeos.so.2
#6  0x7fae68f21a73 in geos::PolygonBuilder::add () from
/usr/lib/libgeos.so.2
#7  0x7fae68f20204 in geos::OverlayOp::computeOverlay () from
/usr/lib/libgeos.so.2
#8  0x7fae68f203e9 in geos::OverlayOp::getResultGeometry () from
/usr/lib/libgeos.so.2
#9  0x7fae68f206bb in geos::OverlayOp::overlayOp () from
/usr/lib/libgeos.so.2
#10 0x7fae68ecbfcc in geos::Geometry::Union () from /usr/lib/libgeos.so.2
#11 0x7fae693c323c in GEOSUnion () from /usr/lib/libgeos_c.so.1
#12 0x7fae695f3b4a in unite_garray () from
/usr/lib/postgresql/8.3/lib/liblwgeom.so
#13 0x005387bb in ?? ()
#14 0x00538aa7 in ExecAgg ()
#15 0x0052e08d in ExecProcNode ()
#16 0x00542b60 in ?? ()
#17 0x00534916 in ExecScan ()
#18 0x0052e11d in ExecProcNode ()
#19 0x0052d1e2 in ExecutorRun ()
#20 0x005c73c2 in ?? ()
#21 0x005c75d5 in ?? ()
#22 0x005c7e74 in PortalRun ()
#23 0x005c394a in ?? ()
#24 0x005c47bc in PostgresMain ()
#25 0x00599424 in ?? ()
#26 0x0059a1a1 in PostmasterMain ()
#27 0x0055123e in main ()

Best regards,

Will

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

2009-06-22 Thread Tom Lane
"Todd A. Cook"  writes:
> Todd A. Cook wrote:
>> Tom Lane wrote:
>>> If you have time to repeat the experiments, it would be interesting to
>>> see what happens with consistent default_statistics_target across 8.3
>>> and 8.4.
>> 
>> That would seem to be it:
>> 8.4b2  183 min  84,028,897,040 on (defaults, 
>> default_statistics_target=10)
>> 
>> I'll run the test on 8.3.7 with default_statistics_target=100 over the 
>> weekend.

> The results for this are also consistent with Tom's theory:
> 8.3.7   205 min  84,048,866,924 on (defaults, 
> default_statistics_target=100)

OK, thanks for following up.  So this is a different effect from the
COPY ring buffer size issue being argued about over on pgsql-hackers.
I think we can just say that this one is a price being paid
intentionally for better statistics, and if you don't need better
statistics you can back off the target setting ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Information about columns

2009-06-22 Thread John DeSoi


On Jun 22, 2009, at 11:43 AM, Dario Teixeira wrote:

How are you talking to the database ? ODBC? JDBC? LibPQ? Something  
else?

Or do you want this from within PL/PgSQL ?


I'm hacking on a client-side library which talks directly to the  
Postmaster
using the wire protocol [1]. I need this information to improve some  
of the

nullability-detection heuristics used by the library.


The information you want is always returned from the query as a row  
description message. This includes the type oid of real and computed  
columns.


See the RowDescription message on this page for details:

http://www.postgresql.org/docs/8.3/static/protocol-message-formats.html



John DeSoi, Ph.D.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Select ranges based on sequential breaks

2009-06-22 Thread Mike Toews

Hi Joel,

Window functions appear to be the best solution for this style of 
problem, and I'm looking forward to their applications. However, I'm 
sticking with 8.3 for at least a year, so I'm not able to explore this 
solution yet. For now, I can only post-process the output in a non-SQL 
environment. I also need to do other fun stuff, like cumulative sums, 
which is also challenging with SQL, but much easier and intuitive with R.


An excellent book that I recently stumbled on is /Joe Celko's SQL for 
Smarties/ (recommended by someone on this list), which is a heavy read, 
but has an amazing depth to ANSI SQL. Of particular interest is "Chapter 
24: Regions, Runs, Gaps, Sequences, and Series". I'm slowly digesting this.


-Mike

Joel Nothman wrote:

Hi Mike,

I happened upon your query, which is related to some stuff I've been
playing with.

Firstly, David's solution below doesn't work. I haven't yet tried to work
out why.

Secondly, I was hoping to be able to solve your problem nicely with
Postgres 8.4's window functions [1,2], which can provide access to
data in sequentially-related rows.

Given the following setup:
CREATE TABLE foo (dt date, bin varchar(4));
INSERT INTO foo VALUES ('2009-01-01', 'red'), ('2009-01-02', 'red'),
('2009-01-03', 'blue'), ('2009-01-04', 'blue'), ('2009-01-05', 'blue'),
('2009-01-06', 'red'), ('2009-01-07', 'blue'), ('2009-01-08', 'blue'),
('2009-01-09', 'red'), ('2009-01-10', 'red');

I had hoped the following would suffice:
SELECT first_value(dt) OVER w, last_value(dt) OVER w, bin FROM foo WINDOW
w AS (ORDER BY dt PARTITION BY bin);

Apparently, this is bad syntax. ORDER BY cannot precede PARTITION BY in a
WINDOW declaration, and yet I wanted a grouping of date-consecutive bins,
which (PARTITION BY bin ORDER BY dt) would not give me.

I was able to produce the required result with:

SELECT MIN(dt) AS first, MAX(dt) AS last, MAX(bin) AS bin
  FROM (SELECT dt, bin,
   CASE WHEN newbin IS NULL
   THEN 0
   ELSE SUM(newbin) OVER (ORDER BY dt)
   END AS binno
  FROM (SELECT *,
   (bin !=3D lag(bin, 1) OVER (ORDER BY dt))::int A=
S
newbin
  FROM foo
   ) AS newbins
  ) AS binnos
GROUP BY binno
ORDER BY first;

This relies on a middle step in which I create an enumeration of the bins
in sequence:
SELECT dt, bin, CASE WHEN newbin IS NULL THEN 0 ELSE sum(newbin) OVER
(ORDER BY dt) END AS binno FROM (SELECT *, (bin !=3D lag(bin, 1) OVER (ORDE=
R
BY dt))::int AS newbin FROM foo) AS newbins;
 dt | bin  | binno
+--+---
 2009-01-01 | red  | 0
 2009-01-02 | red  | 0
 2009-01-03 | blue | 1
 2009-01-04 | blue | 1
 2009-01-05 | blue | 1
 2009-01-06 | red  | 2
 2009-01-07 | blue | 3
 2009-01-08 | blue | 3
 2009-01-09 | red  | 4
 2009-01-10 | red  | 4

I would hope there is a neater way to do this with window functions.

The best way to solve your problem may be with PL/SQL, which is also good
at dealing with sequences (it's not as complicated as it looks!):

CREATE TYPE bindates AS (first date, last date, bin varchar(5));
CREATE OR REPLACE FUNCTION bindates()
RETURNS SETOF bindates
AS $$
DECLARE
  row record;
  res bindates;
BEGIN
  FOR row IN SELECT * FROM foo ORDER BY dt
  LOOP
IF res.bin IS NULL OR res.bin !=3D row.bin THEN
  IF res.bin IS NOT NULL THEN
RETURN NEXT res;
  END IF;
  res.bin :=3D row.bin;
  res.first :=3D row.dt;
END IF;
res.last :=3D row.dt;
  END LOOP;
  IF res.bin IS NOT NULL THEN
RETURN NEXT res;
  END IF;
END;
$$ LANGUAGE plpgsql;


Finally, I'll try to sort out David's solution. Once we correct his typo
(t1.order -> t1.date) and add an 'ORDER BY first' to the end, we get:
   first|last| bin
++--
 2009-01-03 | 2009-01-05 | blue
 2009-01-06 | 2009-01-06 | red
 2009-01-07 | 2009-01-08 | blue
 2009-01-09 || red

This includes correct output, but it fails on both edge cases. The
non-appearance of the first row is due to the WHERE clause on the main
SELECT statement:
WHERE (SELECT bin
   FROM foo t2
   WHERE t2.dt < t1.dt
   ORDER BY dt DESC LIMIT 1) <> t1.bin

If we drop this WHERE clause, we get:
   first|last| bin
++--
 2009-01-01 | 2009-01-02 | red
 2009-01-02 | 2009-01-02 | red
 2009-01-03 | 2009-01-05 | blue
 2009-01-04 | 2009-01-05 | blue
 2009-01-05 | 2009-01-05 | blue
 2009-01-06 | 2009-01-06 | red
 2009-01-07 | 2009-01-08 | blue
 2009-01-08 | 2009-01-08 | blue
 2009-01-09 || red
 2009-01-10 || red

We can therefore get the result including the first row by selecting from
this table with 'GROUP BY last,

Re: [GENERAL] Select ranges based on sequential breaks

2009-06-22 Thread Scott Marlowe
On Mon, Jun 22, 2009 at 12:41 PM, Mike Toews wrote:
> Hi Joel,
> An excellent book that I recently stumbled on is /Joe Celko's SQL for
> Smarties/ (recommended by someone on this list), which is a heavy read, but
> has an amazing depth to ANSI SQL. Of particular interest is "Chapter 24:
> Regions, Runs, Gaps, Sequences, and Series". I'm slowly digesting this.

I need to buy a fourth or fifth copy (they keep growing feet / I keep
forgetting who I loaned them to) of that book.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Information about columns

2009-06-22 Thread Dario Teixeira

Hi,

> The information you want is always returned from the query
> as a row description message. This includes the type oid of
> real and computed columns.
> 
> See the RowDescription message on this page for details:

Thanks for the reply.  Note that is in fact RowDescription that PG'OCaml
is already using to obtain the type oid of real and computed columns.
The problem is that in some circumstances RowDescription does not provide
a pg_class OID where I (naïvely perhaps) expect it.

To be more precise, when issuing a SELECT for tables and views, the
associated pg_class OID is always provided.  Doing a SELECT on 'foobar'
and 'foobar1' will work:

CREATE TABLE foobar (quant int);

CREATE VIEW foobar1 AS SELECT * FROM foobar;

*However*, if I create a new type (which has an associated pg_class entry),
and define a function which returns a SETOF that type, RowDescription will
not tell me its OID.  For example:

CREATE TYPE foobar_t AS (quant int);

CREATE FUNCTION foobar2 () RETURNS SETOF foobar_t AS
'SELECT * FROM foobar' LANGUAGE sql STABLE;

Is this a bug or a conscious decision?  And on the latter case, how can
I retrieve the pg_class OID of foobar_t?

Thanks again,
Dario Teixeira





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Compiling Pl/Java

2009-06-22 Thread Jorge Vidal - Disytel
Hi community,
I've been trying to compile pl/java from cvs with no success.  Is it
going to be avaiable for pg 8.4  ?
We are going to run some intensive test on our system, and I thought it
was a good idea to test it under 8.4, but for that I need pl/java...

Jorge

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Compiling Pl/Java

2009-06-22 Thread Kris Jurka



On Mon, 22 Jun 2009, Jorge Vidal - Disytel wrote:


   I've been trying to compile pl/java from cvs with no success.  Is it
going to be avaiable for pg 8.4  ?


pl/java CVS builds against 8.4.  A common gotcha is that pljava will only 
build with JDK 1.4 or 1.5 and fails with 1.6.  It can be run with a 1.6 
JVM, just not built with it.


Kris Jurka

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Replication

2009-06-22 Thread Gerry Reno
I noticed that the user survey on the community page does not list 
replication among the choices for development priority.  For me, 
replication is the most important thing that is critically missing from 
postgresql.  We need something as good as MySQL Replication.  Both 
statement-based and row-based replication.  And support for 
Master-Master and full cyclic replication setups.  Postgresql is just a 
toy database without this as far as I am concerned.


Regards,
Gerry


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Select ranges based on sequential breaks

2009-06-22 Thread Mike Toews

Scott Marlowe wrote:

On Mon, Jun 15, 2009 at 12:23 PM, Mike Toews wrote:
  

This is easy to compute using a spreadsheet or in R, but how would I do this
with SQL? I'm using 8.3. Advice is appreciated.



FYI (and I'm no expert in this area) R is available as a pl for
postgres, look for pl/R or plR
  
FYI, here is how I implement ranges on sequential breaks in R. Sorry, I 
haven't meddled with plR yet, although I'm experience with both R and 
postgres. This is all R code:


# Randomly sampled bins: "red", "blue"
dat <- data.frame(date=seq(as.Date("2009-01-01"), by="days", length.out=20))
dat$bin <- factor(sample(c("red","blue"), 10, replace=TRUE, 
prob=c(0.4,0.6)))


# Determine where the rows are different; 1=different rows, 0=same rows
dat$breaks <- ifelse(dat$bin != c(TRUE, 
as.character(dat$bin[-nrow(dat)])), 1, 0)


# Determine where the continuous parts are:
dat$part <- factor(cumsum(dat$breaks))

# Results vary due to random sampling
print(dat)


... and on the SQL side, simple aggregates like min(), max(etc) can be 
used with "GROUP BY part" to determine the start/end dates, length of 
duration, etc.


-Mike



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Slight change in query leads to unexpected change in query plan

2009-06-22 Thread Jack Orenstein

I have a table that looks like this:

  create table T(pk int not null, value bytea, ..., primary key(pk))

I want to scan the table in batches of 100. I'll do this by issuing a sequence 
of queries like this:


select *
from T
where pk > ?
and value = ?
order by pk
limit 100

After each query, I'll record the last value seen and use that to
drive the next query. The obvious (to me) execution plan is to use the
index, do an index scan, and then filter using the restriction on
value. I have some realistic data (loaded into postgres 8.3.7) and
I've run analyze. I'm not getting a very good execution plan:

ris=# explain
ris-# select *
ris-# from T
ris-# where pk > 10
ris-# and value = 'asdf'::bytea
ris-# order by pk
ris-# limit 100;
 QUERY PLAN 


-
 Limit  (cost=78352.20..78352.24 rows=16 width=451)
   ->  Sort  (cost=78352.20..78352.24 rows=16 width=451)
 Sort Key: pk
 ->  Bitmap Heap Scan on t  (cost=2091.60..78351.88 rows=16 width=451)
   Recheck Cond: (pk > 10)
   Filter: (value = 'asdf'::bytea)
   ->  Bitmap Index Scan on t_pkey  (cost=0.00..2091.60 rows=91088 
width=0)

 Index Cond: (pk > 10)

But if I remove the value restriction, I get the plan I was hoping for:

ris=# explain
ris-# select *
ris-# from T
ris-# where pk > 10
ris-# order by pk
ris-# limit 100;
QUERY PLAN 


--
 Limit  (cost=0.00..324.99 rows=100 width=451)
   ->  Index Scan using t_pkey on t  (cost=0.00..296027.98 rows=91088 width=451)
 Index Cond: (pk > 10)
(3 rows)

Why is this?

This is an obvious rewrite, e.g.

select *
from (select * from T where pk > ? order by pk limit 100) x
where value = ?

and this produces a good query plan.  But this means that fewer than
100 rows are returned. For reasons too boring to go into, that would
be very inconvenient for my application.

Why does adding the value restriction so radically change the execution plan?

Jack Orenstein

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Information about columns

2009-06-22 Thread John DeSoi


On Jun 22, 2009, at 4:57 PM, Dario Teixeira wrote:

*However*, if I create a new type (which has an associated pg_class  
entry),
and define a function which returns a SETOF that type,  
RowDescription will

not tell me its OID.  For example:

CREATE TYPE foobar_t AS (quant int);

CREATE FUNCTION foobar2 () RETURNS SETOF foobar_t AS
'SELECT * FROM foobar' LANGUAGE sql STABLE;

Is this a bug or a conscious decision?  And on the latter case, how  
can

I retrieve the pg_class OID of foobar_t?


I don't think it is a bug because the documentation clearly states "if  
the field can be identified as a column of a specific table, the  
object ID of the table; otherwise zero." A type is not the same as a  
table.


It is not as elegant as you would like, but maybe one idea is to  
create your own alias of the built in type so you can determine the  
answer just by looking at the column type. For example, instead of  
using "int" in CREATE TYPE above, create a your own type equivalent to  
an integer.




John DeSoi, Ph.D.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Joshua D. Drake
On Mon, 2009-06-22 at 17:53 -0400, Gerry Reno wrote:
> I noticed that the user survey on the community page does not list 
> replication among the choices for development priority.  For me, 
> replication is the most important thing that is critically missing from 
> postgresql.  We need something as good as MySQL Replication.  Both 
> statement-based and row-based replication.  And support for 
> Master-Master and full cyclic replication setups.  Postgresql is just a 
> toy database without this as far as I am concerned.

Funny.

Joshua D. Drake


> 
> Regards,
> Gerry
> 
> 
-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Kevin Barnard


On Jun 22, 2009, at 4:53 PM, Gerry Reno wrote:

I noticed that the user survey on the community page does not list  
replication among the choices for development priority.  For me,  
replication is the most important thing that is critically missing  
from postgresql.  We need something as good as MySQL Replication.   
Both statement-based and row-based replication.  And support for  
Master-Master and full cyclic replication setups.  Postgresql is  
just a toy database without this as far as I am concerned.


Regards,
Gerry



Google postgresql replication.  There are multiple replication /  
clustering options depending on you needs.  It's not built in to the  
DB nor should it be because everyone  has different replication needs.


The idea of separating replication functionality from the core DB  
product isn't new.  AFAIK IBM has always done this on there big iron  
based DB2.  Granted their cheap replication software costs more then  
you paid for that server that is running MySQL, and the expensive  
replication probably costs more then a cabinet worth of MySQL  
servers. :-)


--
Kevin Barnard
kevin.barn...@laser2mail.com




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Gerry Reno

Kevin Barnard wrote:


On Jun 22, 2009, at 4:53 PM, Gerry Reno wrote:

I noticed that the user survey on the community page does not list 
replication among the choices for development priority. For me, 
replication is the most important thing that is critically missing 
from postgresql. We need something as good as MySQL Replication. Both 
statement-based and row-based replication. And support for 
Master-Master and full cyclic replication setups. Postgresql is just 
a toy database without this as far as I am concerned.


Regards,
Gerry



Google postgresql replication. There are multiple replication / 
clustering options depending on you needs. It's not built in to the DB 
nor should it be because everyone has different replication needs.


The idea of separating replication functionality from the core DB 
product isn't new. AFAIK IBM has always done this on there big iron 
based DB2. Granted their cheap replication software costs more then 
you paid for that server that is running MySQL, and the expensive 
replication probably costs more then a cabinet worth of MySQL servers. 
:-)


--
Kevin Barnard
kevin.barn...@laser2mail.com


Have you ever tried any of the postgresql replication offerings? The 
only one that is remotely viable is slony and it is so quirky you may as 
well forget it. The rest are in some stage of decay/abandonment. There 
is no real replication available for postgresql. Postgresql needs to 
develop a real replication offering for postgresql. Builtin or a 
separate module.


Regards,
Gerry


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Joshua D. Drake
On Mon, 2009-06-22 at 18:28 -0400, Gerry Reno wrote:
> Kevin Barnard wrote:

> >
> Have you ever tried any of the postgresql replication offerings? The 
> only one that is remotely viable is slony and it is so quirky you may as 
> well forget it. The rest are in some stage of decay/abandonment. There 
> is no real replication available for postgresql. Postgresql needs to 
> develop a real replication offering for postgresql. Builtin or a 
> separate module.

Well this certainly isn't true but what do I know.

Joshua D. Drake



-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Gerry Reno

Joshua D. Drake wrote:

On Mon, 2009-06-22 at 18:28 -0400, Gerry Reno wrote:
  

Kevin Barnard wrote:



  
Have you ever tried any of the postgresql replication offerings? The 
only one that is remotely viable is slony and it is so quirky you may as 
well forget it. The rest are in some stage of decay/abandonment. There 
is no real replication available for postgresql. Postgresql needs to 
develop a real replication offering for postgresql. Builtin or a 
separate module.



Well this certainly isn't true but what do I know.

Joshua D. Drake



  
It is true.  Otherwise show me a viable replication offering for 
postgresql that I can put into production and obtain support for it.


Regards,
Gerry


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Joshua D. Drake
On Mon, 2009-06-22 at 18:35 -0400, Gerry Reno wrote:
> Joshua D. Drake wrote:

> It is true.  Otherwise show me a viable replication offering for 
> postgresql that I can put into production and obtain support for it.

Well, you can get support for Slony (known to to be a bit complicated
but stable and flexible). You can also get support for Londiste (which
is used in production by Skype... I think that speaks for itself). You
can get support for log shipping if all you need is simple master->slave
redundancy.

I can name others if you like but since you are clearly not able to
effectively use Google nor actually present production requirements so
people can help you, I doubt it would do much good.

Joshua D. Drake

> 
> Regards,
> Gerry
> 
> 
-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Gerry Reno

Joshua D. Drake wrote:

On Mon, 2009-06-22 at 18:35 -0400, Gerry Reno wrote:
  

Joshua D. Drake wrote:



  
It is true.  Otherwise show me a viable replication offering for 
postgresql that I can put into production and obtain support for it.



Well, you can get support for Slony (known to to be a bit complicated
but stable and flexible). 
I've already tried Slony last year and unless something major has 
changed it is not viable.  I cannot have replication that just stops for 
no known reason.



You can also get support for Londiste (which
is used in production by Skype... I think that speaks for itself). 
Londiste is beta.  The fact that Skype uses it is because it's part of 
Skytools which is their product.  They may want to run their own beta 
stuff.  I don't.




You
can get support for log shipping if all you need is simple master->slave
redundancy.
  

If all I needed was log shipping I can do that myself with some scripts.


I can name others if you like but since you are clearly not able to
effectively use Google nor actually present production requirements so
people can help you, I doubt it would do much good.

Joshua D. Drake
  

So name others.

Regards,
Gerry


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Inserting Values into Interval

2009-06-22 Thread BlackMage

Hey,

I am having a small issue when entering values into the interval field. Say
I want to enter a time of 2:03, two minutes and 3 seconds. When I insert
that into an interval field, it comes up at 02:03:00, 2 hours, 3 minutes.
The only way I've gotten around this so far is by doing 00:02:03. But I was
wondering if there is another of inserting into an interval field where the
values will start at the lower end first, so the result will be 00:02:03
when 2:03 is inserted?

This will make developing application so much easier if there is, so thanks.
-- 
View this message in context: 
http://www.nabble.com/Inserting-Values-into-Interval-tp24153731p24153731.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why my queryes doesnt not use indexes?

2009-06-22 Thread DaNieL
Hi guys, i am in trouble with some simple data that seem like doesnt
use any index.. and i dont know why.
My test database structure is this one:

--
CREATE TABLE users(
id BIGSERIAL NOT NULL PRIMARY KEY,
nickname varchar(50),
email varchar(50) NOT NULL
);

CREATE INDEX users_nick_index ON users (nickname);
CREATE UNIQUE INDEX users_email_uindex ON users (email);

INSERT INTO users (nickname, email) VALUES ('foo', 'f...@example.com');
INSERT INTO users (nickname, email) VALUES ('bar', 'b...@example.com');
-

Now, i populated the database with around 5000 rows.

If i run that query:


EXPLAIN SELECT email FROM users WHERE nickname = 'Errol'


The result is:

QUERY PLAN
Bitmap Heap Scan on users  (cost=4.37..36.04 rows=15 width=28)
  Recheck Cond: ((nickname)::text = 'Errol'::text)
  ->  Bitmap Index Scan on users_nick_index  (cost=0.00..4.36 rows=15
width=0)
Index Cond: ((nickname)::text = 'Errol'::text)
---

So seem that it use the index.. but if i use the LIKE:

-
EXPLAIN SELECT email FROM users WHERE nickname LIKE 'E'
--
Postgresql dont use any index, and run with a seq scan:

-
QUERY PLAN
Seq Scan on users  (cost=0.00..112.05 rows=15 width=28)
  Filter: ((nickname)::text ~~ 'E'::text)
--

anyone can explain me why?

Im just structuring a db for a new application, if there is any
problem i'll like to solve it now ;)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trigger Function and backup

2009-06-22 Thread Arndt Lehmann
On Jun 16, 7:48 pm, nishkars...@rediffmail.com (Nishkarsh) wrote:
> Hi Merlin, thanks for the detailed input.
>
> As per ur suggestion i will try to implement Slony-I.
>
> I think i will need some help to do it.
>
> I am useing Postgres 8.3.7, on Windows.
>
> I was following the Slony-I example in the help for pgAdmin III. I am able
> to perform the steps from 1-7. Step 8 : create Slony-I cluster i am getting
> a msg in the interface
>
> "Slony-I creation script no available; only join possible"
>
> On doing some research i found some scripts to be copied (I was not able to
> find very clear instruction) or give slony-I path. i tried all that but was
> not able to move ahead.
>
> Can u plz guide me through &-(%-|
>
> Regards
> Nishkarsh
>
>
>
> Merlin Moncure-2 wrote:
>
> > On Mon, Jun 15, 2009 at 4:29 AM, Nishkarsh
> > wrote:
> >> Hello every one,
>
> >> I am new to databases. I am using Postgres 8.2 (Migrating to 8.3.7 in few
> >> days) on windows platform.
>
> >> I had tried using Slony-I for replication and was not able to create a
> >> cluster.
>
> >> After struggling for some time i decide to implement a way around to take
> >> differential backup. As the tables getting changed were very less.
>
> >> Here is what i intend to do:
>
> >> - Write a trigger for each of the tables in concern
> >> - Some how write a function which can copy / execute the same query in
> >> another temp Db on the same physical system (I have no idea how to do
> >> that)
> >> - Take a backup of temp DB which will be the differential backup of DB
> >> (We
> >> need to clear temp db after backup)
>
> >> Am i going in the right direction?
> >> Is there any way i can implement it.
> >> Any help will be really of great help
>
> > Generating a full trigger based replication system on your own is
> > IMNSHO crazy.  Slony is the best solution to this problem (trigger
> > replication with postgres) that I know of, and is probably better than
> > any one person to come up with in a reasonable amount of time.
> > Probably, your best course of action if you need to get things running
> > right now is to give slony another go (why did you not succeed?).
>
> > Hand written trigger replication is ok if you need to copy, say, a
> > couple of tables or you have some other very specific requirement.  In
> > particular, copying an insert to a mirror database with trigger
> > function wrapping dblink is a snap (updates are more problematic, but
> > doable).  Of course, you need to figure out how to deal with schema
> > updates and other issues that plague replication systems such as
> > volatile data in cascading triggers (just to name one).  General
> > purpose trigger replication is a huge project...
>
> > It sounds to me that what you really want is the 'hot standby' feature
> > that unfortunately missed the cut for 8.4.  Hot standby is probably
> > the easiest way to mirror a database for purposes of read only
> > querying.  There are no triggers to worry about, just a few .conf
> > settings and some other setup to get going (more or less, it isn't
> > finalized yet).  So maybe, waiting for hot standby (or even, digging
> > up a hot standby patch and trying to apply it vs. 8.4 if your
> > adventurous) is the answer.
>
> > Another possibility is to look at statement level replication, like
> > pgpool.
>
> > merlin
>
> > --
> > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> > To make changes to your subscription:
> >http://www.postgresql.org/mailpref/pgsql-general
>
> --
> View this message in 
> context:http://www.nabble.com/Trigger-Function-and-backup-tp24030638p24051851...
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your 
> subscription:http://www.postgresql.org/mailpref/pgsql-general

Hi Merlin,

you could also consider giving "rubyrep" a try.
Like Slony it is also an open source, trigger based, asynchronous
replication solution.

Focus of rubyrep is easy setup.
Tutorial, screencast and other information are available on the
project website:
http://www.rubyrep.org

Best Regards,
  Arndt Lehmann

(Disclosure: I wrote rubyrep)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slight change in query leads to unexpected change in query plan

2009-06-22 Thread Sam Mason
On Mon, Jun 22, 2009 at 05:55:28PM -0400, Jack Orenstein wrote:
> ris-# select *
> ris-# from T
> ris-# where pk > 10
> ris-# and value = 'asdf'::bytea
> ris-# order by pk
> ris-# limit 100;

PG thinks that you're going to get 16 rows back matching those
conditions, bitmap heap scans are faster in some cases and this is
likely to be one of those cases so PG is optimizing things correctly.

>  Limit  (cost=78352.20..78352.24 rows=16 width=451)

> ris-# select *
> ris-# from T
> ris-# where pk > 10
> ris-# order by pk
> ris-# limit 100;

With this query, PG thinks that you may get 91088 rows back but because
you've got a LIMIT in there you only needs the first 100 of them.  It
will therefore prefer a plan that will stop short and thus is preferring
an index scan.

>  Limit  (cost=0.00..324.99 rows=100 width=451)
>->  Index Scan using t_pkey on t  (cost=0.00..296027.98 rows=91088 
> width=451)


> Why does adding the value restriction so radically change the execution 
> plan?

PG doesn't have any cross column statistics and hence it assumes that pk
and value are uncorrelated.  You may get better results with increasing
the statistics target[1] for those columns as that will give PG more
information, but if the columns are indeed correlated then that's not
going to help.

-- 
  Sam  http://samason.me.uk/
 
 [1] http://www.postgresql.org/docs/current/static/sql-altertable.html

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why my queryes doesnt not use indexes?

2009-06-22 Thread Sam Mason
On Mon, Jun 22, 2009 at 08:43:43AM -0700, DaNieL wrote:
> Hi guys, i am in trouble with some simple data that seem like doesnt
> use any index.. and i dont know why.

It can be for a couple of reasons; firstly using an index isn't always a
good thing.  In your case I'd guess you probably want to probably want
to do the following when creating the index:

> CREATE INDEX users_nick_index ON users (nickname varchar_pattern_ops);

That will allow PG to use the index in LIKE expressions.  For more
details see:

  http://www.postgresql.org/docs/current/static/indexes-opclass.html

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgresql-8.3.7 unexpected connection closures

2009-06-22 Thread Andrew Maclean
[snipity snip snip]

Notwithstanding all the previous discussion. I still think there is a
problem with postgresql on windows.

Messages in the log are consistently of the form:
2009-06-23 08:28:26 EST WARNING:  worker took too long to start; cancelled
FATAL:  could not reattach to shared memory (key=252, addr=023F): 487
2009-06-23 08:35:58 EST WARNING:  worker took too long to start; cancelled
FATAL:  could not reattach to shared memory (key=252, addr=023F): 487

Postgres is actually usable if I restart the service after the machine
has booted up. This fixes the problem triggering the error message:

"Error connecting to the server: server closed the connection unexpectedly.
 This probably means that the server terminated abnormally before or
while processing the request."

 However the above messages are still appearing.

Personally I have no issues with the installation process. Although I
am the first to admit that sometimes I forget to check the log file
and the event files. I get used to Linux.

Regards
Andrew


-- 
___
Andrew J. P. Maclean
Centre for Autonomous Systems
The Rose Street Building J04
The University of Sydney  2006  NSW
AUSTRALIA
Ph: +61 2 9351 3283
Fax: +61 2 9351 7474
URL: http://www.acfr.usyd.edu.au/
___

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slight change in query leads to unexpected change in query plan

2009-06-22 Thread Tom Lane
Sam Mason  writes:
> On Mon, Jun 22, 2009 at 05:55:28PM -0400, Jack Orenstein wrote:
>> Why does adding the value restriction so radically change the execution 
>> plan?

> PG doesn't have any cross column statistics and hence it assumes that pk
> and value are uncorrelated.

Even if they are correlated, they aren't necessarily correlated in a
good way; the query plan Jack is hoping for could easily be pessimal.
We routinely see complaints where the planner does what he's hoping for
and gets burnt ...

If the speed of this particular type of query is critical, it might be
worth maintaining a 2-column index on (value, pk).  That would provide
guaranteed good performance since the desired rows form a contiguous run
in the index.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Inserting Values into Interval

2009-06-22 Thread Tom Lane
BlackMage  writes:
> I am having a small issue when entering values into the interval field. Say
> I want to enter a time of 2:03, two minutes and 3 seconds. When I insert
> that into an interval field, it comes up at 02:03:00, 2 hours, 3 minutes.
> The only way I've gotten around this so far is by doing 00:02:03. But I was
> wondering if there is another of inserting into an interval field where the
> values will start at the lower end first, so the result will be 00:02:03
> when 2:03 is inserted?

In 8.4 it'll be possible to do that by declaring the interval as MINUTE
TO SECOND, but there's no way around it in existing releases.  I'm not
sure I'd care to rely on that anyway, because any time you provide an
interval value that isn't *immediately* tied to a MINUTE TO SECOND
qualifier, it's going to get interpreted in the more standard way.
I think you'd be happier in the long run if you avoid depending on such
an ambiguous data format.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Hourly dates

2009-06-22 Thread Andrew Maclean
Is this the best way of getting a table of hourly dates?

-- How to generate a table of dates at hourly intervals between two dates.

-- select timestamp 'epoch' + generate_series * interval '1 second' as
dates from generate_series(extract(epoch from date_trunc('hour',
timestamp '2001-02-16 20:38:40'))::bigint,extract(epoch from
date_trunc('hour', timestamp '2001-02-17 20:38:40'))::bigint, 3600)
select generate_series * interval '1 second' +  date_trunc('hour',
timestamp '2001-02-16 20:38:40') as dates
  from generate_series(0,extract(epoch from(date_trunc('hour',
timestamp '2001-02-17 20:38:40') - date_trunc('hour', timestamp
'2001-02-16 20:38:40')))::bigint, 3600)

The commented out query seems to take into account the timezone which
is not what I want.

Andrew


-- 
___
Andrew J. P. Maclean
Centre for Autonomous Systems
The Rose Street Building J04
The University of Sydney  2006  NSW
AUSTRALIA
Ph: +61 2 9351 3283
Fax: +61 2 9351 7474
URL: http://www.acfr.usyd.edu.au/
___

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Greg Smith

On Mon, 22 Jun 2009, Gerry Reno wrote:


We need something as good as MySQL Replication.


I certainly hope not, I was hoping for a reliable replication solution 
instead.  Wow is the information you get searching for something like 
"mysql replication corruption [replay log|bin log]" scary.  I also 
appreciate fun bits like how you'll get completely quiet master/slave 
mismatches if you should do something crazy like, say, use LIMIT the wrong 
way (see http://dev.mysql.com/doc/refman/5.0/en/replication-features.html 
for more fun like that).


Anyway, you seem to be unaware that built-in replication for PostgreSQL 
already is moving along, with an implementation that's just not quite 
production quality yet, and might make into the next version after 8.4 if 
things go well.  That's probably why it's not on the survey--everybody 
knows that's important and it's already being worked on actively.


P.S. another Google search, this one for "postgresql replication support", 
finds the mythical company that sells multiple products and support for 
this purpose on hit #2 for me.  Or you could use the alternate approach of 
looking at the jobs of the everyone who's been giving your a hard time in 
this thread...


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgresql-8.3.7 unexpected connection closures

2009-06-22 Thread Craig Ringer
On Tue, 2009-06-23 at 09:28 +1000, Andrew Maclean wrote:
> [snipity snip snip]
> 
> Notwithstanding all the previous discussion. I still think there is a
> problem with postgresql on windows.

I agree, but you don't seem to be prepared take any steps to diagnose
what Pg might be interacting with to cause the problem.

This doesn't happen on all Windows installs. What makes yours different?
If we remove things that make it different from the default, does the
problem go away? If so, after what component was removed did the problem
go away?

If you won't help test by removing your AV software and checking if the
problem goes away, nobody can really help you since you're not willing
to go through any sort of fault isolation / diagnostic process.

Nobody here can wave a magic wand and make the problem go away; it's
going to take work on your side to help collect information that might
help identify the cause and lead to a fix.

--
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Gerry Reno

Greg Smith wrote:

On Mon, 22 Jun 2009, Gerry Reno wrote:


We need something as good as MySQL Replication.


I certainly hope not, I was hoping for a reliable replication solution 
instead. Wow is the information you get searching for something like 
"mysql replication corruption [replay log|bin log]" scary. I also 
appreciate fun bits like how you'll get completely quiet master/slave 
mismatches if you should do something crazy like, say, use LIMIT the 
wrong way (see 
http://dev.mysql.com/doc/refman/5.0/en/replication-features.html for 
more fun like that).


I didn't mean to imply that MySQL Replication was perfect. But I've been 
using it for over three years with very few problems. And yes with 
statement-based replication you can get some interesting replication 
anomalies if you're not careful. But, that's true of any statement-based 
replication with any database.



Anyway, you seem to be unaware that built-in replication for 
PostgreSQL already is moving along, with an implementation that's just 
not quite production quality yet, and might make into the next version 
after 8.4 if things go well. 
No, I'm aware of this basic builtin replication. It was rather 
disappointing to see it moved out of the 8.4 release. We need something 
more that just basic master-slave replication which is all this simple 
builtin replication will provide. We need a real replication solution 
that can handle statement-based and row-based replication. Multi-master 
replication. Full cyclic replication chain setups. Simple master-slave 
just doesn't cut it.


That's probably why it's not on the survey--everybody knows that's 
important and it's already being worked on actively.
Ok, I just felt it should still be there. But, I hope development 
understands just how important good replication really is.


P.S. another Google search, this one for "postgresql replication 
support", finds the mythical company that sells multiple products and 
support for this purpose on hit #2 for me. Or you could use the 
alternate approach of looking at the jobs of the everyone who's been 
giving your a hard time in this thread...

I figured as much.



--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD




Regards,
Gerry


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Hourly dates

2009-06-22 Thread Steve Crawford

Andrew Maclean wrote:

Is this the best way of getting a table of hourly dates?

-- How to generate a table of dates at hourly intervals between two dates.

-- select timestamp 'epoch' + generate_series * interval '1 second' as
dates from generate_series(extract(epoch from date_trunc('hour',
timestamp '2001-02-16 20:38:40'))::bigint,extract(epoch from
date_trunc('hour', timestamp '2001-02-17 20:38:40'))::bigint, 3600)
select generate_series * interval '1 second' +  date_trunc('hour',
timestamp '2001-02-16 20:38:40') as dates
  from generate_series(0,extract(epoch from(date_trunc('hour',
timestamp '2001-02-17 20:38:40') - date_trunc('hour', timestamp
'2001-02-16 20:38:40')))::bigint, 3600)

The commented out query seems to take into account the timezone which
is not what I want.

Andrew


  


Depends on what you have available as input. If you know the starting 
time and number of records it's pretty easy:


Without time-zone:
select '2009-03-05 0100'::timestamp + generate_series(0,100) * '1 
hour'::interval;

...
2009-03-07 23:00:00
2009-03-08 00:00:00
2009-03-08 01:00:00
2009-03-08 02:00:00
2009-03-08 03:00:00
2009-03-08 04:00:00
...


With time-zone info:
select '2009-03-05 0100'::timestamptz + generate_series(0,100) * '1 
hour'::interval;

...
2009-03-07 23:00:00-08
2009-03-08 00:00:00-08
2009-03-08 01:00:00-08
2009-03-08 03:00:00-07
2009-03-08 04:00:00-07
2009-03-08 05:00:00-07
2009-03-08 06:00:00-07
...

Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Tom Lane
Greg Smith  writes:
> On Mon, 22 Jun 2009, Gerry Reno wrote:
>> We need something as good as MySQL Replication.

> I certainly hope not, I was hoping for a reliable replication solution 
> instead.  Wow is the information you get searching for something like 
> "mysql replication corruption [replay log|bin log]" scary.

My experience, stretching over more than five years now, is that mysql
replication fails its own regression tests a significant percentage of
the time ... in nonreproducible fashion of course, so it's hard to file
bug reports.  I'm aware of this because I package the thing for Red Hat,
and I run mysql's regression tests as part of that build, and close to
half the time the build fails in the regression tests, invariably in
the replication-related tests.  Never twice the same mind you; when
I resubmit the job, with the exact same SRPM, it usually works.

This might be some artifact of the Red Hat/Fedora build farm
environment, since my builds on my own workstation seldom fail.  But
it's persisted over multiple incarnations of that build farm and quite
a few versions of mysql.  I've never been able to pin it down enough
to file a bug report.

I can't say I'd trust mysql replication with any data I cared about.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Craig Ringer
On Mon, 2009-06-22 at 20:48 -0400, Gerry Reno wrote:

> > Anyway, you seem to be unaware that built-in replication for 
> > PostgreSQL already is moving along, with an implementation that's just 
> > not quite production quality yet, and might make into the next version 
> > after 8.4 if things go well. 

> No, I'm aware of this basic builtin replication. It was rather 
> disappointing to see it moved out of the 8.4 release. We need something 
> more that just basic master-slave replication which is all this simple 
> builtin replication will provide. We need a real replication solution 
> that can handle statement-based and row-based replication. Multi-master 
> replication. Full cyclic replication chain setups. Simple master-slave 
> just doesn't cut it.

Statement-based replication is, frankly, scary.

Personally I'd only be willing to use it if the database would guarantee
to throw an exception when any statement that may produce different
results on master and slave(s) was issued, like the
limit-without-order-by case mentioned on the MySQL replication docs.

Even then I don't really understand how it can produce consistent
replicas in the face of, say, two concurrent statements both pulling
values from a sequence. There would need to be some sort of side channel
to allow the master to tell the slave about how it allocated values from
the sequence.

My overall sentiment is "ick".

Re multi-master replication, out of interest: what needs does it satisfy
for you that master-slave doesn't?

- Scaling number of clients / read throughput in read-mostly workloads?

- Client-transparent fault-tolerance?

- ... ?

What limitations of master-slave replication with read-only slaves
present roadblocks for you?

- Client must connect to master for writes, otherwise master or slave,
  so must be more aware of connection management

- Client drivers have no way to transparently discover active master,
  must be told master hostname/ip

- ... ?

I personally find it difficult to understand how multi-master
replication can add much to throughput on write-heavy workloads. DBs are
often I/O limited after all, and if each master must write all the
others' changes you may not see much of a performance win in write heavy
environments. So: I presume multi-master replication is useful mainly in
read-mostly workloads ? Or do you expect throughput gains in write-heavy
workloads too?

If the latter, is it really multiple master replication you want rather
than a non-replica clustered database, where writes to one node don't
get replicated to the other nodes, they just get notified via some sort
of cache coherence protocol?

I guess my point is that personally I think it'd be helpful to know
_why_ you need more than what's on offer. What specific features pose
problems or would benefit you, how, and why. Etc.

> > That's probably why it's not on the survey--everybody knows that's 
> > important and it's already being worked on actively.
> Ok, I just felt it should still be there. But, I hope development 
> understands just how important good replication really is.

"development" appear to be well aware. They're also generally very
willing to accept help, testing, and users who're willing to trial early
efforts. Hint, hint. Donations of paid developer time to work on a
project you find to be commercially important probably wouldn't go
astray either.

-- 
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trigger Function and backup

2009-06-22 Thread Andrew Smith
Arndt,

Your website says rubyrep runs on Linux and Windows - am I going to have
difficulties if I want to try it on Solaris 10?

Andrew

2009/6/23 Arndt Lehmann 

> On Jun 16, 7:48 pm, nishkars...@rediffmail.com (Nishkarsh) wrote:
> > Hi Merlin, thanks for the detailed input.
> >
> > As per ur suggestion i will try to implement Slony-I.
> >
> > I think i will need some help to do it.
> >
> > I am useing Postgres 8.3.7, on Windows.
> >
> > I was following the Slony-I example in the help for pgAdmin III. I am
> able
> > to perform the steps from 1-7. Step 8 : create Slony-I cluster i am
> getting
> > a msg in the interface
> >
> > "Slony-I creation script no available; only join possible"
> >
> > On doing some research i found some scripts to be copied (I was not able
> to
> > find very clear instruction) or give slony-I path. i tried all that but
> was
> > not able to move ahead.
> >
> > Can u plz guide me through &-(%-|
> >
> > Regards
> > Nishkarsh
> >
> >
> >
> > Merlin Moncure-2 wrote:
> >
> > > On Mon, Jun 15, 2009 at 4:29 AM, Nishkarsh
> > > wrote:
> > >> Hello every one,
> >
> > >> I am new to databases. I am using Postgres 8.2 (Migrating to 8.3.7 in
> few
> > >> days) on windows platform.
> >
> > >> I had tried using Slony-I for replication and was not able to create a
> > >> cluster.
> >
> > >> After struggling for some time i decide to implement a way around to
> take
> > >> differential backup. As the tables getting changed were very less.
> >
> > >> Here is what i intend to do:
> >
> > >> - Write a trigger for each of the tables in concern
> > >> - Some how write a function which can copy / execute the same query in
> > >> another temp Db on the same physical system (I have no idea how to do
> > >> that)
> > >> - Take a backup of temp DB which will be the differential backup of DB
> > >> (We
> > >> need to clear temp db after backup)
> >
> > >> Am i going in the right direction?
> > >> Is there any way i can implement it.
> > >> Any help will be really of great help
> >
> > > Generating a full trigger based replication system on your own is
> > > IMNSHO crazy.  Slony is the best solution to this problem (trigger
> > > replication with postgres) that I know of, and is probably better than
> > > any one person to come up with in a reasonable amount of time.
> > > Probably, your best course of action if you need to get things running
> > > right now is to give slony another go (why did you not succeed?).
> >
> > > Hand written trigger replication is ok if you need to copy, say, a
> > > couple of tables or you have some other very specific requirement.  In
> > > particular, copying an insert to a mirror database with trigger
> > > function wrapping dblink is a snap (updates are more problematic, but
> > > doable).  Of course, you need to figure out how to deal with schema
> > > updates and other issues that plague replication systems such as
> > > volatile data in cascading triggers (just to name one).  General
> > > purpose trigger replication is a huge project...
> >
> > > It sounds to me that what you really want is the 'hot standby' feature
> > > that unfortunately missed the cut for 8.4.  Hot standby is probably
> > > the easiest way to mirror a database for purposes of read only
> > > querying.  There are no triggers to worry about, just a few .conf
> > > settings and some other setup to get going (more or less, it isn't
> > > finalized yet).  So maybe, waiting for hot standby (or even, digging
> > > up a hot standby patch and trying to apply it vs. 8.4 if your
> > > adventurous) is the answer.
> >
> > > Another possibility is to look at statement level replication, like
> > > pgpool.
> >
> > > merlin
> >
> > > --
> > > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> > > To make changes to your subscription:
> > >http://www.postgresql.org/mailpref/pgsql-general
> >
> > --
> > View this message in context:
> http://www.nabble.com/Trigger-Function-and-backup-tp24030638p24051851...
> > Sent from the PostgreSQL - general mailing list archive at Nabble.com.
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> > To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> Hi Merlin,
>
> you could also consider giving "rubyrep" a try.
> Like Slony it is also an open source, trigger based, asynchronous
> replication solution.
>
> Focus of rubyrep is easy setup.
> Tutorial, screencast and other information are available on the
> project website:
>http://www.rubyrep.org
>
> Best Regards,
>  Arndt Lehmann
>
> (Disclosure: I wrote rubyrep)
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Replication

2009-06-22 Thread Gerry Reno

Craig Ringer wrote:

On Mon, 2009-06-22 at 20:48 -0400, Gerry Reno wrote:

  
Anyway, you seem to be unaware that built-in replication for 
PostgreSQL already is moving along, with an implementation that's just 
not quite production quality yet, and might make into the next version 
after 8.4 if things go well. 
  


  
No, I'm aware of this basic builtin replication. It was rather 
disappointing to see it moved out of the 8.4 release. We need something 
more that just basic master-slave replication which is all this simple 
builtin replication will provide. We need a real replication solution 
that can handle statement-based and row-based replication. Multi-master 
replication. Full cyclic replication chain setups. Simple master-slave 
just doesn't cut it.



Statement-based replication is, frankly, scary.

Personally I'd only be willing to use it if the database would guarantee
to throw an exception when any statement that may produce different
results on master and slave(s) was issued, like the
limit-without-order-by case mentioned on the MySQL replication docs.
  
I don't know how it could guarantee that.  That's really why row-based 
is better.




Even then I don't really understand how it can produce consistent
replicas in the face of, say, two concurrent statements both pulling
values from a sequence. There would need to be some sort of side channel
to allow the master to tell the slave about how it allocated values from
the sequence.
  
Sequences I deal with by setting up an offset and increment for each 
replica so that there are no conflicts.
You have to know the entire replication array size prior to setup.  I 
usually set increment to 10 and then I can offset up to 10 replicas.

My overall sentiment is "ick".

Re multi-master replication, out of interest: what needs does it satisfy
for you that master-slave doesn't?

- Scaling number of clients / read throughput in read-mostly workloads?
  

yes

- Client-transparent fault-tolerance?
  

yes.

- ... ?

What limitations of master-slave replication with read-only slaves
present roadblocks for you?
  
failure of single master. 

- Client must connect to master for writes, otherwise master or slave,
  so must be more aware of connection management

- Client drivers have no way to transparently discover active master,
  must be told master hostname/ip

- ... ?

I personally find it difficult to understand how multi-master
replication can add much to throughput on write-heavy workloads. DBs are
often I/O limited after all, and if each master must write all the
others' changes you may not see much of a performance win in write heavy
environments. So: I presume multi-master replication is useful mainly in
read-mostly workloads ? Or do you expect throughput gains in write-heavy
workloads too?

If the latter, is it really multiple master replication you want rather
than a non-replica clustered database, where writes to one node don't
get replicated to the other nodes, they just get notified via some sort
of cache coherence protocol?

I guess my point is that personally I think it'd be helpful to know
_why_ you need more than what's on offer. What specific features pose
problems or would benefit you, how, and why. Etc.

  
That's probably why it's not on the survey--everybody knows that's 
important and it's already being worked on actively.
  
Ok, I just felt it should still be there. But, I hope development 
understands just how important good replication really is.



"development" appear to be well aware. They're also generally very
willing to accept help, testing, and users who're willing to trial early
efforts. Hint, hint. Donations of paid developer time to work on a
project you find to be commercially important probably wouldn't go
astray either.

  


Regards,
Gerry



Re: [GENERAL] Information about columns

2009-06-22 Thread Tom Lane
Dario Teixeira  writes:
> *However*, if I create a new type (which has an associated pg_class entry),
> and define a function which returns a SETOF that type, RowDescription will
> not tell me its OID.  For example:
> ...
> Is this a bug or a conscious decision?

It's intentional; IIRC, the current behavior is defined that way because
that's what the JDBC driver needs to implement the JDBC specs.  Putting
information about composite types where information about tables is
expected would confuse the heck out of existing client code.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Craig Ringer
On Mon, 2009-06-22 at 21:29 -0400, Gerry Reno wrote:

> I don't know how it could guarantee that.  That's really why row-based
> is better.

Yep, especially in the face of things like user PL functions, C
functions, etc.

This page:

http://dev.mysql.com/doc/refman/5.0/en/replication-features-functions.html

is downright alarming, and (implicitly) says quite enough about how
statement-based replication is a really, REALLY bad idea.

Rather than replicating sets of changed rows, though, I suspect that
block-level replication using the WAL is probably more efficient.
Certainly it'll be easier on the slave in terms of the work required to
keep up with the master.

I guess block-level replication isn't much good for multi-master,
though, since you'd be spending half your time finding out what the
other masters were doing and what their state was, or telling them about
yours. (I guess that's the case anyway to some extent, though, any time
you have concurrent statements on different masters using the same data
and one or more of them is altering it).

> Sequences I deal with by setting up an offset and increment for each
> replica so that there are no conflicts.

Ah, so you don't actually care if the replicas are identical - you
expect things like different primary keys on master and replicas(s) ?

How do your applications cope if they switch from one replica to another
and suddenly primary key identifiers are different?

> > What limitations of master-slave replication with read-only slaves
> > present roadblocks for you?
> >   
> failure of single master.

For that, read-only slave + heartbeat based failover with STONITH (shoot
the other node in the head) by something like IPMI remote-poweroff or a
USB-controlled power switch would be sufficient.

The only part of the requirements for this that PG can't already satisfy
is synchronous replication - the current WAL-based replication doesn't
guarantee that the slave has the changes before the client's commit
returns successfully, so recent changes that the client thinks are
committed might be lost on failover. Synchronous replication is, of
course, what's being worked on right now, partly to address just this
issue and partly to allow for read-only reporting slaves.

This technique is well established, very robust, and it's not hard to
implement in a way that ensures that the slave - when it takes over as
master - claims the master's MAC address and IP address so clients
barely notice a change.

With Pg it'd break any existing connections, but any database
application worth a damn must be able to handle re-issuing transactions
due to deadlocks, resource exhaustion, admin statement cancellation etc
anyway.

-- 
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Scott Marlowe
On Mon, Jun 22, 2009 at 4:51 PM, Gerry Reno wrote:
> Joshua D. Drake wrote:
>>
>> On Mon, 2009-06-22 at 18:35 -0400, Gerry Reno wrote:
>>> Joshua D. Drake wrote:
>>>
>>> It is true.  Otherwise show me a viable replication offering for
>>> postgresql that I can put into production and obtain support for it.
>>
>> Well, you can get support for Slony (known to to be a bit complicated
>> but stable and flexible).
>
> I've already tried Slony last year and unless something major has changed it
> is not viable.  I cannot have replication that just stops for no known
> reason.

I've been running slony since 1.0 came out, and have NEVER had it just
stop replication for no known reason.  ever.  Your inability to use it
tells me much less about slony than it does about you.

>> You can also get support for Londiste (which
>> is used in production by Skype... I think that speaks for itself).
>
> Londiste is beta.  The fact that Skype uses it is because it's part of
> Skytools which is their product.  They may want to run their own beta stuff.
>  I don't.

So, if they said it was general release, but it sucked, you'd try it,
but since they say it's beta, no way?  Wow.  Just wow.  The amount of
dumb in that sentence is not measurable with modern instrumentation.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgresql-8.3.7 unexpected connection closures

2009-06-22 Thread Tom Lane
Andrew Maclean  writes:
> Messages in the log are consistently of the form:
> 2009-06-23 08:28:26 EST WARNING:  worker took too long to start; cancelled
> FATAL:  could not reattach to shared memory (key=252, addr=023F): 487
> 2009-06-23 08:35:58 EST WARNING:  worker took too long to start; cancelled
> FATAL:  could not reattach to shared memory (key=252, addr=023F): 487

I suspect the causality is actually the other way around from what you
imply here: an autovac worker process fails to start (generating the
FATAL ... 487 message) and after awhile the autovac launcher figures out
that the worker died and issues the WARNING.  It might be hard to tell
that since the launcher would probably try to launch another worker
immediately after realizing the previous one died.  Looking at the
startup sequence to see which one appears first would be useful.

The "reattach to shared memory" problem is known, what we don't know is
exactly what causes it or how to fix it.  As noted, we need some people
who can reproduce the problem consistently (none of the developers can)
to poke into it and find out what aspect of their systems causes it.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Select ranges based on sequential breaks

2009-06-22 Thread Joel Nothman

On Tue, 23 Jun 2009 04:41:44 +1000, Mike Toews  wrote:
Window functions appear to be the best solution for this style of  
problem, and I'm looking forward to their applications. However, I'm  
sticking with 8.3 for at least a year, so I'm not able to explore this  
solution yet. For now, I can only post-process the output in a non-SQL  
environment. I also need to do other fun stuff, like cumulative sums,  
which is also challenging with SQL, but much easier and intuitive with R.


As a largely procedural programmer, the PL/SQL solution is quite appealing  
to me, and would be similarly simple to calculate cumulative sums. The  
integration of SELECT statements within PL/SQL also seems much tighter  
than with other PL languages. Unfortunately, one can't send a cursor or a  
set of results directly as a PL argument.


I'm having a skim through Celko's chapter 24, but it doesn't seem to be  
close to my needs either.


On Tue, 23 Jun 2009 08:05:14 +1000, Mike Toews  wrote:
...

# Determine where the rows are different; 1=different rows, 0=same rows
dat$breaks <- ifelse(dat$bin != c(TRUE,  
as.character(dat$bin[-nrow(dat)])), 1, 0)


# Determine where the continuous parts are:
dat$part <- factor(cumsum(dat$breaks))


Yes, as far as I can tell, this is almost identical to my WINDOW-based  
solution in finding when there is a change, marking it with 0 or 1 and the  
using cumulative sum to number the partitions. This could be similarly  
done in PL/SQL but it seemed more sensible to just do the whole thing  
rather than using GROUP BY after enumeration.


- Joel

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Gerry Reno

Craig Ringer wrote:

On Mon, 2009-06-22 at 21:29 -0400, Gerry Reno wrote:

  

I don't know how it could guarantee that.  That's really why row-based
is better.



Yep, especially in the face of things like user PL functions, C
functions, etc.

This page:

http://dev.mysql.com/doc/refman/5.0/en/replication-features-functions.html

is downright alarming, and (implicitly) says quite enough about how
statement-based replication is a really, REALLY bad idea.

Rather than replicating sets of changed rows, though, I suspect that
block-level replication using the WAL is probably more efficient.
Certainly it'll be easier on the slave in terms of the work required to
keep up with the master.

I guess block-level replication isn't much good for multi-master,
though, since you'd be spending half your time finding out what the
other masters were doing and what their state was, or telling them about
yours. (I guess that's the case anyway to some extent, though, any time
you have concurrent statements on different masters using the same data
and one or more of them is altering it).

  

Sequences I deal with by setting up an offset and increment for each
replica so that there are no conflicts.



Ah, so you don't actually care if the replicas are identical - you
expect things like different primary keys on master and replicas(s) ?

How do your applications cope if they switch from one replica to another
and suddenly primary key identifiers are different?
  
Here is a link that describes the technique:  
http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html?page=1




Regards,
Gerry




Re: [GENERAL] Replication

2009-06-22 Thread Conrad Lender
On 23/06/09 03:44, Scott Marlowe wrote:
> On Mon, Jun 22, 2009 at 4:51 PM, Gerry Reno wrote:
>> Londiste is beta.  The fact that Skype uses it is because it's part
>> of Skytools which is their product.  They may want to run their own
>> beta stuff. I don't.
> 
> So, if they said it was general release, but it sucked, you'd try it,
> but since they say it's beta, no way?  Wow.  Just wow.  The amount
> of dumb in that sentence is not measurable with modern
> instrumentation.

To be fair, the "beta" label has been abused a lot in the last years;
and what's more, it has been used as an excuse to refuse support (I'm
looking at Google here). Another point would be that Skype has come
under attack for using what basically amounts to a black box protocol in
their main application - many security-minded people are sceptical of
the company for this reason, and I can't blame them. That said, I do use
pgbouncer, which is also a Skype project (released under the BSD
license). After some casual code review I found it to be of good
quality, and I'm now using it in production environments. I don't think
it's so unreasonable to be questioning projects which are only available
as "betas". There was a time when "beta" meant caveat emptor, this
product is not fully tested, and if it breaks, we'd like to hear about
it, but we won't be surprised. Trusting such a product with database
replication may well work, but it's a risk not everybody's willing to take.


  - Conrad

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Craig Ringer
On Mon, 2009-06-22 at 22:20 -0400, Gerry Reno wrote:

> Here is a link that describes the technique:
> http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html?page=1

Ah. You were referring to multiple-master replication, and your
reference to setting non-overlapping sequences referred to avoiding
collisions caused by inserts on two different masters. Yes, using
non-overlapping allocation ranges for sequences is indeed one way to
handle that, but it's not actually related to what I was talking about
anyway.


What I was referring to in the parent post was an issue with
statement-based replication of concurrent statements sharing a sequence.
It's completely unrelated; both statements are running on the SAME
server (master) and replicating to the slave. For example, take two
concurrent statements each of which inserts 10 generated rows into the
dummy table 'x':

CREATE SEQUENCE x;
CREATE TABLE x (
  a INTEGER PRIMARY KEY DEFAULT nextval('x_id_seq'),
  b INTEGER NOT NULL
);



CONNECTION (1) TO MASTER  CONNECTION (2) TO MASTER
- --
Issues INSERT INTO x (a,b)
SELECT nextval('x_id_seq'),1
FROM generate_series(0,9);
  Issues INSERT INTO x (a,b)
  SELECT nextval('x_id_seq'),2
  FROM generate_series(0,9);

nextval() returns 1
nextval() returns 2 
nextval() returns 3
  nextval() returns 4
nextval() returns 5
  nextval() returns 6
nextval() returns 7
nextval() returns 8
  nextval() returns 9
  nextval() returns 10
  nextval() returns 11
nextval() returns 12

... etc


If you issue the same two statements on the slave, the ordering in which
those nextval() calls are interleaved will be different. So, while on
the master according to the example above table 'x' would contain:

 a b
(1,1)
(2,1)
(3,1)
(4,2)
(5,1)
(6,2)
(7,1)
(8,1)
(9,2)
(10,2)
(11,2)
...


on the slave it might land up containing something like:

 a b
(1,1)
(2,2)
(3,2)
(4,1)
(5,2)
(6,1)
(7,1)
(8,2)
(9,1)
(10,1)
(11,2)

so your slave and master contain TOTALLY DIFFERENT DATA. Yet, there's
nothing wrong with the ordering of execution on the master being
non-deterministic, as we still got what we asked for. We have 10 rows
with unique primary keys and b=1, and ten rows with unique primary keys
and b=2 . We don't actually care what those primary key values are since
they're synthetic primary keys, we only care that they're unique. In a
master/slave situation, though, we also care that the SAME primary key
identifies the SAME entity on both master and slave, and that won't
happen with statement-based replication when concurrent statements
interleave in non-deterministic ways.

Of course, it's rather nice in performance terms that such statements
CAN be interleaved without synchronisation or locking. In fact, that's
why PostgreSQL sequences exist.

In this particular case, the server could work around it by logging its
selection of generated values to some sort of side channel (akin to
MySQL's replication binlog) so the slave can use that as its source for
them. That's kind of error prone, though, as it requires every such
function ( nextval, random(), etc ) to have support for replication
manually added, and will result in hopelessly out-of-sync slaves if a
function isn't handled. It also doesn't provide an answer for other
non-deterministic result sets like use of a function in a result set
with LIMIT without ORDER BY .

The problem is that if you do statement-based replication, the order in
which reads from the sequence by each statement are interleaved is
undefined and depends on the OS's I/O and processor scheduling. The
slave will not produce the same ordering, so the same statements
executed on the slave will result in inserted rows having different
generated keys than on the master.

MySQL appears to tackle these problems by
   look! a cassowary! Over there!

Anyway, what was I saying? Oh, yes, MySQL appears to ignore these
problems or expect a paranoidly careful admin to avoid them. Some
functions are just broken and don't replicate properly; some statements
will produce wrong results on the slave, etc.

You won't EVER see that sort of thing in PostgreSQL.

So ... it doesn't seem likely that statement-level replication would
ever get far in Pg because of nasty issues like this one.

That was my point re concurrent execution of statements. Nothing to do
with ensuring key uniqueness without inter-node synchronisation in
multi-master environments.

Block-level master/slave synchronous replication, however, is already on
the way. (Also, Slony provides row-level master/slave replication that
seems to work well for a lot of people, though it's widely admitted to
be a bit of a pain to work with and not particularly nice.)

Re: [GENERAL] Replication

2009-06-22 Thread Scott Marlowe
On Mon, Jun 22, 2009 at 8:50 PM, Conrad Lender wrote:
> On 23/06/09 03:44, Scott Marlowe wrote:
>> On Mon, Jun 22, 2009 at 4:51 PM, Gerry Reno wrote:
>>> Londiste is beta.  The fact that Skype uses it is because it's part
>>> of Skytools which is their product.  They may want to run their own
>>> beta stuff. I don't.
>>
>> So, if they said it was general release, but it sucked, you'd try it,
>> but since they say it's beta, no way?  Wow.  Just wow.  The amount
>> of dumb in that sentence is not measurable with modern
>> instrumentation.
>
> To be fair, the "beta" label has been abused a lot in the last years;
> and what's more, it has been used as an excuse to refuse support (I'm
> looking at Google here). Another point would be that Skype has come
> under attack for using what basically amounts to a black box protocol in
> their main application - many security-minded people are sceptical of
> the company for this reason, and I can't blame them. That said, I do use
> pgbouncer, which is also a Skype project (released under the BSD
> license). After some casual code review I found it to be of good
> quality, and I'm now using it in production environments. I don't think
> it's so unreasonable to be questioning projects which are only available
> as "betas". There was a time when "beta" meant caveat emptor, this
> product is not fully tested, and if it breaks, we'd like to hear about
> it, but we won't be surprised. Trusting such a product with database
> replication may well work, but it's a risk not everybody's willing to take.

Beta or alpha or final or production, they all mean nothing unless
they are applied to a specific piece of code and it's rep.  I've seen
plenty of software that was supposedly supported that was never fixed
or fixed at a leisurely pace (see mysql and packaging mistakes and
innodb order by desc bugs for examples).  I've used "alpha" products
in limited, well tested roles in production that worked and worked
well.  OpenSSL which I trust to do a good job, is 0.9. something right
now, which screams not "release" to me.

What makes code production worthy is that YOU have tested it
thoroughly and that YOU guarantee it to work or you'll fix it as long
as it's used in a way you can test for properly before upgrade /
update deployments.  How fast do fixes come out? How well is it
maintained.  An actively maintained beta may be a better answer in a
moving landscape because it can keep up.  Beta means beta.  And what
that means to an individual developer may not be what you expect it to
be.  The risk is purely non-existent based on the naming of the
release IF IT'S BEEN TESTED PROPERLY.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Scott Marlowe
On Mon, Jun 22, 2009 at 8:59 PM, Craig
Ringer wrote:

> So ... it doesn't seem likely that statement-level replication would
> ever get far in Pg because of nasty issues like this one.

It's exactly what pg_pool does, and you can choose it if you know what
you're doing.  But yes, it's usually a bad fit for replication by
itself.

> That was my point re concurrent execution of statements. Nothing to do
> with ensuring key uniqueness without inter-node synchronisation in
> multi-master environments.
>
> Block-level master/slave synchronous replication, however, is already on
> the way. (Also, Slony provides row-level master/slave replication that
> seems to work well for a lot of people, though it's widely admitted to
> be a bit of a pain to work with and not particularly nice.)

I think it's real easy to work with, once you understand that "it's
boss". I.e. you do things the slony way, or get used to recreating /
resubscribing a lot of times during maintenance windows when you can
run on one db.  The mis-feature of no ability to drop tables caught me
out.  Now we don't drop tables, period.  We rename and alter to get
around that.  Once I told the developers not to drop tables in order
to change them, things got better.  Really it was bad habits learned
from other dbs.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Craig Ringer
On Mon, 2009-06-22 at 21:12 -0600, Scott Marlowe wrote:
> On Mon, Jun 22, 2009 at 8:59 PM, Craig
> Ringer wrote:
> 
> > So ... it doesn't seem likely that statement-level replication would
> > ever get far in Pg because of nasty issues like this one.
> 
> It's exactly what pg_pool does, and you can choose it if you know what
> you're doing.  But yes, it's usually a bad fit for replication by
> itself.

Ah - I didn't realise that pg_pool did statement-based inter-node
replication; I was under the impression that it was primarily for
distribution of read-only queries between multiple clone nodes. Am I
confusing it and pgbouncer? I haven't had any need for tools intended to
scale up client counts, and haven't looked into them much.

In any case, I have a _bit_ less of a problem with the notion of
statement-level replication when it's add-on software rather than a core
part of the DB. To me, it seems like you should be able to trust the DB
to just get it right, and not have to worry about whether something's
safe with replication, etc.

If you have some kind of add-on or intermediary messing with things, at
least it's clear where responsibility lies - with the admin and the
add-on. The DB just does what it's told, consistently and just like
normal. Even so, I'm still very definitely not a fan of something that
can essentially cause silent data corruption as a result of minor
oversights in statement design.

I'm a bit leery of Slony for related reasons - because it messes with
the way Pg works to the point where, as you noted, you can't even drop
tables. Clients have to be very wary of the replication system's quirks
and issues, which I intensely dislike.

I know all these things, like MySQL's statement-level replication, have
their uses and have real advantages in situations where it's worth
trading coding pain and admin time for performance (to the point where
they may in some situations be BETTER than clustering / mm replication),
but I do think they're still VERY ugly. I personally really wouldn't
want to see statement level replication in particular in the core.

As you can probably imagine, I'm really rather excited about the coming
work on synchronous block-level replication to a read-only slave. I'd
also be pretty happy to see DDL triggers and/or triggers on system
tables to allow things like Slony to be a bit less ugly to work with.

> Once I told the developers not to drop tables in order
> to change them, things got better.  Really it was bad habits learned
> from other dbs.

I can understand working around such limitations, but it seems a stretch
to call use of the DB's normal capabilities "bad habits". Well, unless
you mean people using DROP TABLE / CREATE TABLE instead of just using
ALTER TABLE to do the job - but even then, there are times when a drop
and re-create is the preferable option in the absence of external
limitations like Slony.

-- 
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trigger Function and backup

2009-06-22 Thread Arndt Lehmann
On Jun 23, 10:20 am, laconi...@gmail.com (Andrew Smith) wrote:
> Arndt,
>
> Your website says rubyrep runs on Linux and Windows - am I going to have
> difficulties if I want to try it on Solaris 10?
>
> Andrew
>
> 2009/6/23 Arndt Lehmann 
>
> > On Jun 16, 7:48 pm, nishkars...@rediffmail.com (Nishkarsh) wrote:
> > > Hi Merlin, thanks for the detailed input.
>
> > > As per ur suggestion i will try to implement Slony-I.
>
> > > I think i will need some help to do it.
>
> > > I am useing Postgres 8.3.7, on Windows.
>
> > > I was following the Slony-I example in the help for pgAdmin III. I am
> > able
> > > to perform the steps from 1-7. Step 8 : create Slony-I cluster i am
> > getting
> > > a msg in the interface
>
> > > "Slony-I creation script no available; only join possible"
>
> > > On doing some research i found some scripts to be copied (I was not able
> > to
> > > find very clear instruction) or give slony-I path. i tried all that but
> > was
> > > not able to move ahead.
>
> > > Can u plz guide me through &-(%-|
>
> > > Regards
> > > Nishkarsh
>
> > > Merlin Moncure-2 wrote:
>
> > > > On Mon, Jun 15, 2009 at 4:29 AM, Nishkarsh
> > > > wrote:
> > > >> Hello every one,
>
> > > >> I am new to databases. I am using Postgres 8.2 (Migrating to 8.3.7 in
> > few
> > > >> days) on windows platform.
>
> > > >> I had tried using Slony-I for replication and was not able to create a
> > > >> cluster.
>
> > > >> After struggling for some time i decide to implement a way around to
> > take
> > > >> differential backup. As the tables getting changed were very less.
>
> > > >> Here is what i intend to do:
>
> > > >> - Write a trigger for each of the tables in concern
> > > >> - Some how write a function which can copy / execute the same query in
> > > >> another temp Db on the same physical system (I have no idea how to do
> > > >> that)
> > > >> - Take a backup of temp DB which will be the differential backup of DB
> > > >> (We
> > > >> need to clear temp db after backup)
>
> > > >> Am i going in the right direction?
> > > >> Is there any way i can implement it.
> > > >> Any help will be really of great help
>
> > > > Generating a full trigger based replication system on your own is
> > > > IMNSHO crazy.  Slony is the best solution to this problem (trigger
> > > > replication with postgres) that I know of, and is probably better than
> > > > any one person to come up with in a reasonable amount of time.
> > > > Probably, your best course of action if you need to get things running
> > > > right now is to give slony another go (why did you not succeed?).
>
> > > > Hand written trigger replication is ok if you need to copy, say, a
> > > > couple of tables or you have some other very specific requirement.  In
> > > > particular, copying an insert to a mirror database with trigger
> > > > function wrapping dblink is a snap (updates are more problematic, but
> > > > doable).  Of course, you need to figure out how to deal with schema
> > > > updates and other issues that plague replication systems such as
> > > > volatile data in cascading triggers (just to name one).  General
> > > > purpose trigger replication is a huge project...
>
> > > > It sounds to me that what you really want is the 'hot standby' feature
> > > > that unfortunately missed the cut for 8.4.  Hot standby is probably
> > > > the easiest way to mirror a database for purposes of read only
> > > > querying.  There are no triggers to worry about, just a few .conf
> > > > settings and some other setup to get going (more or less, it isn't
> > > > finalized yet).  So maybe, waiting for hot standby (or even, digging
> > > > up a hot standby patch and trying to apply it vs. 8.4 if your
> > > > adventurous) is the answer.
>
> > > > Another possibility is to look at statement level replication, like
> > > > pgpool.
>
> > > > merlin
>
> > > > --
> > > > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> > > > To make changes to your subscription:
> > > >http://www.postgresql.org/mailpref/pgsql-general
>
> > > --
> > > View this message in context:
> >http://www.nabble.com/Trigger-Function-and-backup-tp24030638p24051851...
> > > Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
> > > --
> > > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> > > To make changes to your subscription:
> >http://www.postgresql.org/mailpref/pgsql-general
>
> > Hi Merlin,
>
> > you could also consider giving "rubyrep" a try.
> > Like Slony it is also an open source, trigger based, asynchronous
> > replication solution.
>
> > Focus of rubyrep is easy setup.
> > Tutorial, screencast and other information are available on the
> > project website:
> >    http://www.rubyrep.org
>
> > Best Regards,
> >  Arndt Lehmann
>
> > (Disclosure: I wrote rubyrep)
>
> > --
> > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> > To make changes to your subscription:
> >http://www.postgresql.org/mailpref/pgsql-general
>
>

Hi Andr

Re: [GENERAL] Hourly dates

2009-06-22 Thread Andrew Maclean
Thanks for this, I can easily determine the number of records I want.

Andrew


On Tue, Jun 23, 2009 at 10:52 AM, Steve
Crawford wrote:
> Andrew Maclean wrote:
>>
>> Is this the best way of getting a table of hourly dates?
>>
>> -- How to generate a table of dates at hourly intervals between two dates.
>>
>> -- select timestamp 'epoch' + generate_series * interval '1 second' as
>> dates from generate_series(extract(epoch from date_trunc('hour',
>> timestamp '2001-02-16 20:38:40'))::bigint,extract(epoch from
>> date_trunc('hour', timestamp '2001-02-17 20:38:40'))::bigint, 3600)
>> select generate_series * interval '1 second' +  date_trunc('hour',
>> timestamp '2001-02-16 20:38:40') as dates
>>  from generate_series(0,extract(epoch from(date_trunc('hour',
>> timestamp '2001-02-17 20:38:40') - date_trunc('hour', timestamp
>> '2001-02-16 20:38:40')))::bigint, 3600)
>>
>> The commented out query seems to take into account the timezone which
>> is not what I want.
>>
>> Andrew
>>
>>
>>
>
> Depends on what you have available as input. If you know the starting time
> and number of records it's pretty easy:
>
> Without time-zone:
> select '2009-03-05 0100'::timestamp + generate_series(0,100) * '1
> hour'::interval;
> ...
> 2009-03-07 23:00:00
> 2009-03-08 00:00:00
> 2009-03-08 01:00:00
> 2009-03-08 02:00:00
> 2009-03-08 03:00:00
> 2009-03-08 04:00:00
> ...
>
>
> With time-zone info:
> select '2009-03-05 0100'::timestamptz + generate_series(0,100) * '1
> hour'::interval;
> ...
> 2009-03-07 23:00:00-08
> 2009-03-08 00:00:00-08
> 2009-03-08 01:00:00-08
> 2009-03-08 03:00:00-07
> 2009-03-08 04:00:00-07
> 2009-03-08 05:00:00-07
> 2009-03-08 06:00:00-07
> ...
>
> Cheers,
> Steve
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
___
Andrew J. P. Maclean
Centre for Autonomous Systems
The Rose Street Building J04
The University of Sydney  2006  NSW
AUSTRALIA
Ph: +61 2 9351 3283
Fax: +61 2 9351 7474
URL: http://www.acfr.usyd.edu.au/
___

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgresql-8.3.7 unexpected connection closures

2009-06-22 Thread Andrew Maclean
Thanks for this. I can't take the machines out of service at present,
but when I can, I'll look into shutting down services and seeing what
happens.


Andrew



On Tue, Jun 23, 2009 at 11:49 AM, Tom Lane wrote:
> Andrew Maclean  writes:
>> Messages in the log are consistently of the form:
>> 2009-06-23 08:28:26 EST WARNING:  worker took too long to start; cancelled
>> FATAL:  could not reattach to shared memory (key=252, addr=023F): 487
>> 2009-06-23 08:35:58 EST WARNING:  worker took too long to start; cancelled
>> FATAL:  could not reattach to shared memory (key=252, addr=023F): 487
>
> I suspect the causality is actually the other way around from what you
> imply here: an autovac worker process fails to start (generating the
> FATAL ... 487 message) and after awhile the autovac launcher figures out
> that the worker died and issues the WARNING.  It might be hard to tell
> that since the launcher would probably try to launch another worker
> immediately after realizing the previous one died.  Looking at the
> startup sequence to see which one appears first would be useful.
>
> The "reattach to shared memory" problem is known, what we don't know is
> exactly what causes it or how to fix it.  As noted, we need some people
> who can reproduce the problem consistently (none of the developers can)
> to poke into it and find out what aspect of their systems causes it.
>
>                        regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
___
Andrew J. P. Maclean
Centre for Autonomous Systems
The Rose Street Building J04
The University of Sydney  2006  NSW
AUSTRALIA
Ph: +61 2 9351 3283
Fax: +61 2 9351 7474
URL: http://www.acfr.usyd.edu.au/
___

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication

2009-06-22 Thread Arndt Lehmann
Hi Craig,

just wanted to mention that there is a new open-source solution
available that now also enables asynchronous, row-based, master-master
replication of PostgreSQL databases.

Name: rubyrep

Project website with full feature list, step-by-step tutorial and
screencast (from zero to running replication in under 5 minutes)
available here:
http://www.rubyrep.org

Best Regards,
  Arndt Lehmann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] after vacuum analyze, explain still wrong

2009-06-22 Thread Sim Zacks
I ran vacuum analyze and immediately after I ran my query and the
estimated rows are way off. I suspect that it is something in my
configuration, but I don't know what.
I pasted my postgresql.conf file under the explain results.

Thank you
Sim

GroupAggregate  (cost=4542.87..4543.12 rows=1 width=32) (actual
time=150947.014..150984.246 rows=83 loops=1)
  ->  GroupAggregate  (cost=4542.87..4543.03 rows=1 width=120) (actual
time=150946.978..150975.130 rows=3754 loops=1)
->  Sort  (cost=4542.87..4542.88 rows=3 width=120) (actual
time=150946.949..150950.724 rows=7123 loops=1)
  Sort Key: a.assembliesbatchid, a.duedate, a.assemblyid,
a.assemblyname, a.ownerid, a.partid, a.quantity, a.stock, a.prioruse,
a.units, a.qtyperunit, a.priorusebatch
  ->  Nested Loop Left Join  (cost=3634.74..4542.85 rows=3
width=120) (actual time=47753.176..150896.984 rows=7123 loops=1)
Join Filter: (e.partid = a.partid)
->  GroupAggregate  (cost=2679.49..2679.56 rows=1
width=89) (actual time=47702.211..47873.649 rows=3754 loops=1)
  ->  Sort  (cost=2679.49..2679.50 rows=1
width=89) (actual time=47702.193..47755.292 rows=41197 loops=1)
Sort Key: a.assembliesbatchid,
a.duedate, a.assemblyid, a.assemblyname, a.ownerid, a.partid,
a.quantity, a.stock, a.units, a.qtyperunit, a.leadfree
->  Nested Loop Left Join
(cost=2556.68..2679.48 rows=1 width=89) (actual time=9767.220..46933.103
rows=41197 loops=1)
  Join Filter: ((a.partid =
b.partid) AND ((a.ownerid IS NULL) OR (a.ownerid = 1)) AND (a.leadfree =
q.leadfree) AND ((a.duedate > c.duedate) OR ((a.duedate = c.duedate) AND
(a.assembliesbatchid > c.assembliesbatchid
  ->  GroupAggregate
(cost=2381.63..2381.68 rows=1 width=82) (actual time=9754.658..9773.515
rows=3754 loops=1)
->  Sort
(cost=2381.63..2381.64 rows=1 width=82) (actual time=9754.641..9757.533
rows=3754 loops=1)
  Sort Key:
d.assembliesbatchid, d.duedate, a.assemblyid, a.assemblyname, c.ownerid,
e.partid, COALESCE(c.stock, 0::bigint), d.units, e.quantity, a.leadfree
  ->  Nested Loop Left
Join  (cost=145.04..2381.62 rows=1 width=82) (actual
time=39.045..9726.318 rows=3754 loops=1)
->  Nested Loop
Left Join  (cost=145.04..2381.33 rows=1 width=74) (actual
time=38.955..9678.057 rows=3754 loops=1)
  Join
Filter: ((c.partid = e.partid) AND leadcompcheck_ab(a.leadfree,
c.leadstateid))
  ->  Nested
Loop  (cost=145.04..2366.26 rows=1 width=62) (actual time=0.791..83.482
rows=3754 loops=1)
->
Nested Loop Left Join  (cost=145.04..2365.59 rows=1 width=28) (actual
time=0.721..52.840 rows=3754 loops=1)

  Filter: (f.commited IS NOT TRUE)

  ->  Hash Join  (cost=136.62..273.55 rows=168 width=32) (actual
time=0.706..14.201 rows=3754 loops=1)

Hash Cond: (e.assembliesbatchid = d.assembliesbatchid)

->  Seq Scan on allocatedassemblies e  (cost=0.00..121.45
rows=3679 width=12) (actual time=0.039..6.140 rows=3754 loops=1)

  Filter: ((- quantity) <> 0)

->  Hash  (cost=135.51..135.51 rows=89 width=24) (actual
time=0.612..0.612 rows=88 loops=1)

  ->  Bitmap Heap Scan on assembliesbatch d
(cost=17.69..135.51 rows=89 width=24) (actual time=0.122..0.521 rows=88
loops=1)

Recheck Cond: (assembliesbatchstatusid = ANY
('{1,2,4,7}'::integer[]))

->  Bitmap Index Scan on
fki_assembliesbatch_assembliesbatchstatus_id  (cost=0.00..17.66 rows=89
width=0) (actual time=0.098..0.098 rows=89 loops=1)

  Index Cond: (assembliesbatchstatusid = ANY
('{1,2,4,7}'::integer[]))

  ->  Bitmap Heap Scan on stocklog f  (cost=8.43..12.44 rows=1 width=9)
(actual time=0.006..0.006 rows=0 loops=3754)

Recheck Cond: ((f.refid = d.batchid) AND (f.partid = e.partid))

Filter: (transtypeid = 3)

->  BitmapAnd  (cost=8.43..8.43 rows=1 width=0) (actual
time=0.004..0.004 rows=0 loops=3754)

  ->  Bitmap Index Scan on referenceidsl  (cost=0.00..3.62
rows=29 width=0) (actual time=0.001..0.001 rows=1 loops=3754)

Index Cond: (f.refid = d.batchid)

  ->  Bitmap Index Scan on ix_partsstocklog
(cost=0.00..4.54 rows=132 width=0) (actual time=0.116..0.116 rows=379
loops=67)

Index Cond: (f.partid = e.partid)
->
Index Scan using assemblies_pkey on assemblies a  (cost=0.00..0.65
rows=1 width=38) (actual time=0.004..0.005 row