[GENERAL] Strange duplicate key violation error

2007-06-28 Thread Casey Duncan
I have this table "replica_sync_log" which is updated thousands of  
times a day to reflect the state of various schemas in a database  
which acts as an offline secondary to various other databases (each  
of the source databases is mapped to its own schema in the  
secondary). The table has the following definition:


Table "radio_prod_default.replica_sync_log"
  Column  |Type | Modifiers
--+-+---
db_host  | text| not null
db_port  | text| not null
db_name  | text| not null
last_sync_id | integer |
last_sync_time   | timestamp without time zone |
last_commit_time | timestamp without time zone |
Indexes:
"replica_sync_log_pkey" PRIMARY KEY, btree (db_host, db_port,  
db_name), tablespace "data1"

Tablespace: "data1"

There are in fact one of these tables for each schema, each one  
contains exactly one row (the "log" in the name is a bit misleading,  
these just contain the current replica state, not a running log).


Here is an error that I got this morning that doesn't make much sense  
to me:


2007-06-28 08:53:54.937 PDT [d:radio_reports_new u:slony s:4683d86f. 
3681 3] ERROR:  duplicate key violates unique constraint  
"replica_sync_log_pkey"
2007-06-28 08:53:54.937 PDT [d:radio_reports_new u:slony s:4683d86f. 
3681 4] STATEMENT:  UPDATE replica_sync_log SET last_sync_id =  
7147892, last_sync_time = '2007-06-27 23:28:04.586846'::timestamp,  
last_commit_time = 'Thu Jun 28 08:53:54 2007'::timestamp WHERE  
db_name = 'radio_prod_default' AND db_host = 'radiodb-default-1' AND  
db_port = '5432';


I've never seen this error before in millions of updates to this  
table. It confuses me because the columns indexed by the primary key  
*are not* being modified, so how could the update violate the  
constraint? Furthermore there is only one record in the table anyway.


The updates are performed by daemons that are in charge of  
replicating the changes for one database each. So there is no  
concurrent updating on this table, only one process updates it.


The error caused the daemon to pause, so I restarted it. The error  
did not recur the second time.


fwiw this is postgres 8.1.8 on Linux.

Thanks for any insights.

-Casey


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Backup Large Tables

2006-09-21 Thread Casey Duncan
Are you dumping the whole database or just a single table? If it's  
the former, try the latter and see if you still get errors.


If pg_dump is not working, maybe some system table is hosed. What  
errors are you getting?


If you can get in via psql, log in as a superuser and execute:

COPY mytable TO 'mytable.txt';

That will dump the table data to a text file which can be re-imported  
into a new database using the COPY FROM command. Basically you're  
just doing part of what pg_dump does for you by hand.


-Casey

On Sep 21, 2006, at 9:19 PM, Charles Ambrose wrote:


Hi!

I encounter errors in dumping the database using pg_dump. The  
database i think is corrupt. It was looking for triggers and stored  
procedures that are now longer in the database. This is also the  
reason why I opted to create a program to dump the database.


On 9/22/06, Michael Nolan <[EMAIL PROTECTED]> wrote: I have a table  
with over 6 million rows in it that I do a dump on every night.  It  
takes less than 2 minutes to create a file that is around 650 MB.


Are you maybe dumping this file in 'insert' mode?
--
Mike Nolan


On 9/21/06, Charles Ambrose < [EMAIL PROTECTED]> wrote: Hi!

I have a fairly large database tables (say an average of  3Million  
to 4Million records).  Using the pg_dump utility takes forever to  
dump the database tables. As an alternative, I have created a  
program that gets all the data from the table and then put it into  
a text file. I was also unsuccessfull in this alternative to dump  
the database.








---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] postgress 8.1.4 deadlocking??

2006-09-27 Thread Casey Duncan

On Sep 27, 2006, at 7:28 AM, Rafal Pietrak wrote:


Hi,

I fell into the following problem (unfortunately, the database  
contents
has sensitive customer information, so can publish very little of  
that).


Currently postgress process takes close to 100% CPU time.

I've restarted the process a moment ago, and it was calm for a brief
minute.

It started to consume the CPU once I've issued:
ALTER GROUP majster DROP USER rafal;
... which took close to 5minutes to complete. Then I've issued:


Sounds like it was blocked (unsure by what). You can use pg_locks to  
check that.



VACUUM ANALYZE ;
... which I had to cancel when it didnt' completed within another
10minutes. CPU utilisation remains at 100% even after the VACCUM was
canceled.


VACUUM sometimes hangs around for a while after it is cancelled,  
which can be annoying. I think this has been improved for 8.2 FWIW


What was consuming all the CPU? I assume from what you wrote  
previously that it is a postgres backend. If so, what was it doing?  
You can use ps, the pg_stat_activity view and strace to figure that out.



pg_dumpall was successfull during the large-CPU-usage time.

...another postgress-process restart, and another VACCUM ANALYSE, this
one completes in 3-5 secs. Now "ALTER GROUP ..." goes OK. And  
everything

seams working.


Was something else using the database previously? Was something else  
different this time than the last?




the database is currently almost empty: c.a. 100k records within  
c.a. 20

tables with another 30 or so views. There are c.a. 6k userids (roles)
created in the cluster.


How big is the data on disk? Is it possible that you have queries  
scanning tables containing lots of dead tuples? If so a VACUUM FULL  
would help, but that's totally speculative.



Any ideas where should I look (system tables? process stats??) to
dieagnose the problem?  just to be prepared when it appears again?


Postgres itself should not spontaneously start using all of the cpu.  
Some query operations must be running unless there is an underlying  
problem (hardware, os, fs). When you saw this happen did you happen  
to check if it was user or system cpu usage? I had a corrupt  
filesystem once that cause pg to go off into the weeds consuming all  
cpu forever, but it was all *system* cpu time. Also I couldn't shut  
pg down because it was presumably caught in an eternal system call  
unable to respond to signals. Your description above sounds different  
than this to me, however.


-Casey



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] postgress 8.1.4 deadlocking??

2006-09-27 Thread Casey Duncan


On Sep 27, 2006, at 12:35 PM, Rafal Pietrak wrote:

Thenx Duncan for the analysis.

This happend again, so I'm able to peek at the details you've pointed
out.

On Wed, 2006-09-27 at 09:33 -0700, Casey Duncan wrote:

Sounds like it was blocked (unsure by what). You can use pg_locks to
check that.


That view currently contais 240 entries  source of blocking. Now I
have to dig the ultimate reason why those locks end up there (and are
not being cleand). Any ideas?


You need to look at the procpid that's holding the lock and then use  
pg_stat_activity to see what it is doing. I'm not sure what locks  
that query needs to hold, it may be blocked by any concurrent query  
from a user in that group, I dunno.




Whould those locks go away by themselves  in time?


Yeah, once the transactions holding the locks complete.




What was consuming all the CPU? I assume from what you wrote
previously that it is a postgres backend. If so, what was it doing?


ps output (just one backend)
postgres  4759  0.0  2.0  19368  5292 ?S21:02   0:00
postgres: czyn_738773 czyn 127.0.0.1(55431) UPDATE waiting
---

"czyn" is the name of the database, "czyn_738773" is database  
username.

IP address is localhost, since the database purpose is web serbvice
back-end.


If the update is waiting, it is blocked and should not be consuming  
any noticeable cpu. This was using lots of CPU at this moment?




Still, I'm not sure what that UPDATE counld that be.

You can use ps, the pg_stat_activity view and strace to figure  
that out.


- exerpt from pg_stat_activity-
 17660 | czyn |4759 |19541 | czyn_738773 |  | | 2006-09-27 21:02:05.914728+02 | 127.0.0.1   |
55431
(18 rows)
--

But this is not very meaningfull to me.


You need to add this to your postgresql.conf:

stats_command_string = on

Then send a HUP signal to the postmaster (you can use pgctl reload  
for this or 'kill -HUP')


Then you can see what the exact command is.

Another option that may be helpful to diagnose is:

log_min_duration_statement = 500

This will output the sql for all commands taking longer than 500ms.  
You can adjust this up and down depending on what counts as "slow" in  
your app. You probably only want it on temporarily, but I find it  
sometimes helpful to always leave it on with a suitably high value  
(like 1000). That way you can always troll the logs for laggards.




...another postgress-process restart, and another VACCUM ANALYSE,  
this

one completes in 3-5 secs. Now "ALTER GROUP ..." goes OK. And
everything
seams working.


Was something else using the database previously? Was something else
different this time than the last?


No. My impression was (naive, I know), that not all things were
correctly cleand-up on the first restart. Now I think, that on the
second go I was just a little bit more lucky - www service was calm  
for

a little longer. That was anough for VACUUM to complete.


Yeah, it sounds like things are fairly busy given the locks above.



Still, throuble started later.


How big is the data on disk? Is it possible that you have queries


close to nothing: c.a. 200MiB


Sounds like the problem is more about concurrency then data size. How  
many concurrent connections do you have (select count(*) from  
pg_stat_activity)? How many are in transaction and running queries  
concurrently?


If you see multiple instances of a given query (or the same one  
appearing frequently), try running EXPLAIN ANALYZE on it. That'll  
tell you what makes it slow.





problem (hardware, os, fs). When you saw this happen did you happen
to check if it was user or system cpu usage? I had a corrupt


It's the user CPU time.


Yeah, just sounds like a scaling issue. Hopefully easily solved ;^)



As it currently looks like some unknown UPDATE causes the trouble, how
can I identify what it is?


see config recommendations above.


1. the database is not very sophisticated, apart from what I've
mentioned earlier, it does have just a few stored procedures in  
pgsql -
nothing fancy. And there is no triggers just specialized views  
(although

quite a lot of referencial integrity constrains).


These can be a source of lock contention on updates (though this is  
much better on 8.1 compared to 7.4). Once you know the update query  
that's getting blocked you can know for sure.



2. machine is 'from the old days': 450MHz CPU


What is the iowait like on this box? You can use iostat to determine  
this. That will tell you if the disks are also struggling.



3. the system is currently under 'available-for-public' condition, so
there may be 'waves' (peeks) of activity.
4. when the system was tested in-lab (small number of users - no  
risk /

[GENERAL] Expected accuracy of planner statistics

2006-09-28 Thread Casey Duncan
I have some databases that have grown significantly over time (as  
databases do). As the databases have grown, I have noticed that the  
statistics have grown less and less accurate. In particular, the  
n_distinct values have become many OOM too small for certain foreign  
key columns. Predictably this leads to poor query plans.


The databases in question were all using the default stats target  
value, so naturally the thing to do is to increase that and see what  
happens. First I'll show you one table in question:


qa_full=# \d fk
   Table "public.fk"
 Column   |Type |   Modifiers
--+-+---
fk_id | bigint  | not null
st_id | bigint  | not null
is_positive   | boolean | not null
mc_id | character varying(20)   | not null
matching_seed | character varying(20)   |
ft_id | character varying(20)   |
s_title   | text| not null
a_summary | text| not null
date_created  | timestamp without time zone | default now()
qx_id | bigint  |
Indexes:
"fk_pkey" PRIMARY KEY, btree (fk_id)
"fk_st_mc_id_idx" UNIQUE, btree (st_id, mc_id)
"fk_date_created_is_positive_idx" btree (is_positive, date_created)
"fk_st_id_idx" btree (st_id)
Foreign-key constraints:
"fk_qx_id_fkey" FOREIGN KEY (qx_id) REFERENCES st(st_id) ON  
DELETE RESTRICT
"fk_st_id_fkey" FOREIGN KEY (st_id) REFERENCES st(st_id) ON  
DELETE RESTRICT



qa_full=# select count(*) from fk;
   count
---
19889

Here are the n_distinct stats on the st_id column with stock stats  
settings:


qa_full=# select n_distinct from pg_stats where tablename='fk' and  
attname='st_id'';

  attname  | n_distinct
---+-
st_id  |  14910

here's the actual distinct count:

qa_full=# select count(distinct st_id) from fk;
count
--
15191387
(1 row)

Here's what it looks like after turning the stats target up to 100:

qa_full=# select n_distinct from pg_stats where tablename='fk' and  
attname='st_id'';

  attname  | n_distinct
---+-
st_id  | 136977

Still way off (3 OOM), so let's pull out the stops and go for 1000:

qa_full=# select n_distinct from pg_stats where tablename='fk' and  
attname='st_id'';

  attname  | n_distinct
---+-
st_id  |  860796

Better, but still way off. Here's more of the pg_stats row for the  
curious with the stats target at 1000:


schemaname| public
tablename | fk
attname   | st_id
null_frac | 0
avg_width | 8
n_distinct| 860796
most_common_vals  |  
{9822972459012807,81553350123749183,50260420266636724,16953859416556337, 
57992478091506908,6789385517968759,13155841310992808,4649594156182905,11 
950505984130111,19815690615418387,23232929805154508,24940819255590358,25 
304517086243633,30084673952005845,33845252828401578,36510232790970904,44 
301350711321256,47572440754042499,66302045808587415,106949745150210138,7 
94825759857,11709841786637953,12034360925626832,17311819170902574,21 
933556169120032,31401742852411043,37178443803282644,39714175315169346,42 
699954975194688,63648700912541567,73785794393665562,...many elided..}
most_common_freqs |  
{7.3e-05,6.7e-05,5.3e-05,5e-05,5e-05,4.7e-05,4.7e-05 
, 
4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05, 
4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05, 
4.3e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05, 
4e-05,4e-05,4e-05,3.7e-05,3.7e-05,3.7e-05,3.7e-05,3. 
7e-05,3.7e-05,3.7e-05,3.7e-05,3.7e-05,3.7e-05,3. 
7e-05,3.7e-05,3.7e-05,3.7e-05,3.3e-05,3.3e-05,3. 
3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3. 
3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3. 
3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3. 
3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3e-05,3e-05,3e-05, 
3e-05,3e-05,3e-05,3e-05,3e-05,3e-05,..many elided..}
histogram_bounds  |  
{9474697855526,186642098833097,425502410065792,655064117100237,917344884 
40,1135224280975580,1510900775316064,1919850381534192,23918286327044 
65,2773745714634569,3197981109338899,3601128214604953,3887435029566307,4 
289757501117626,4604286546172963,503060515434,5410915764179364,57126 
62986537560,6096452674229658,6531206443844232,6761515475182966,692428185 
0823004,7145897868348599,7357502317108796,7537560231072453,7737194605867 
515,7923617661480232,8094845122681350,8304911973154200,8504211340608556, 
8735469559703009,9008968782181381,9233161779966219,..many elided..}

correlation   | 0.770339

The correlation is likely high here because this table has been  
clustered on th

Re: [GENERAL] Expected accuracy of planner statistics

2006-09-28 Thread Casey Duncan

On Sep 28, 2006, at 8:51 PM, Tom Lane wrote:
[..]
The information we've seen says that the only statistically  
reliable way

to arrive at an accurate n_distinct estimate is to examine most of the
table :-(.  Which seems infeasible for extremely large tables,  
which is

exactly where the problem is worst.  Marginal increases in the sample
size seem unlikely to help much ... as indeed your experiment shows.


I think a first step might be to introduce a new analyze command,  
such as ANALYZE FULL. This would be executed deliberately (IOW not by  
autovacuum) like CLUSTER or VACUUM FULL when deemed necessary by the  
dba. The command as implied would scan the entire table and fill in  
the stats based on that (as analyze used to do IIRC). It would also  
be useful if this command froze the stats so that autovacuum didn't  
clobber them with inaccurate ones shortly thereafter. Perhaps an  
explicit ANALYZE FULL FREEZE command would be useful for that case,  
the behavior being that a normal ANALYZE would not overwrite the  
stats for a stats-frozen table, another ANALYZE FULL would, however.  
Such a frozen state would also be useful if you wanted to hand-tweak  
stats for a single table and have it stick and still use autovac. As  
I understand it now, with autovac on, you cannot do that unless you  
hack the pg_autovacuum table (i.e., set anl_base_thresh to an  
artificially high value).


Another option (that I think others have suggested) would be to make  
this the behavior for VACUUM ANALYZE. That saves the baggage of a new  
command at least. Another advantage would be that the autovac daemon  
could run it. Perhaps some smarts could also be built in. What if  
VACUUM ANALYZE first runs a normal (sampled) ANALYZE. Then it  
performs the VACUUM with full ANALYZE pass. The stats gathered by the  
latter full pass are compared to that of the first sampled pass. If  
the full ANALYZE statistics are sufficiently different from the  
sampled pass, then the table is flagged so that normal ANALYZE is not  
performed by the autovac daemon on that table. Also, a global ANALYZE  
could ignore it (though this seems more magical).


A more pie-in-the-sky idea could take advantage of the fact that the  
larger a table is the less likely the statistics will change much  
over time. If we cannot afford to sample many rows in a given analyze  
pass, then perhaps we should use a "newton's method" approach where  
we attempt to converge on an accurate value over time with each  
analyze pass contributing more samples to the statistics and honing  
them incrementally rather than simply replacing the old ones.


I'm not statistician, so it's not clear to me how much more state you  
would need to keep between analyze passes to make this viable, but in  
order for this to work the following would need to be true:


1) Analyze would need to be run on a regular basis (luckily we have  
autovaccum to help). You would want to analyze this table  
periodically even if nothing much changed, however. Perhaps tuning  
the autovac parameters is enough here.


2) Each analyze pass would need to sample randomly so that multiple  
passes tend to sample different rows.


3) The stats would need to somehow be cumulative. Perhaps this means  
storing sample values between passes, or some other statistical voodoo.


4) Needs to be smart enough to realize when a table has changed  
drastically, and toss out the old stats in this case. Either that or  
we require a human to tell us via ANALYZE FULL/VACUUM ANALYZE.


I think that the incremental stats approach would more or less depend  
on the full ANALYZE functionality for bootstrapping. I think when you  
first load the table, you want to get the stats right immediately and  
not wait some indeterminate amount of time for them to "converge" on  
the right value.


-Casey







---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Array assignment behavior (was Re: [ADMIN] Stored procedure array limits)

2006-09-29 Thread Casey Duncan

On Sep 29, 2006, at 9:14 AM, Tom Lane wrote:


[ expanding this thread, as it now needs wider discussion ]

"Paul B. Anderson" <[EMAIL PROTECTED]> writes:

Actually, I was not filling all of the arrays in sequential order.  I
added code to initialize them in order and the function seems to be
working now.  Is that a known problem?


Well, it's a documented behavior: section 8.10.4 saith

A stored array value can be enlarged by assigning to an element
adjacent to those already present, or by assigning to a slice
that is adjacent to or overlaps the data already present.

Up to 8.2 we didn't have a lot of choice about this, because  
without any
ability to have nulls embedded in arrays, there wasn't any sane  
thing to

do with the intermediate positions if you assigned to an element not
adjacent to the existing range.  As of 8.2 we could allow  
assignment to

arbitrary positions by filling the intermediate positions with nulls.
The code hasn't actually been changed to allow that, but it's  
something

we could consider doing now.

Comments?


At first blush, this strikes me as a bit too magical/implicit. Are  
there other languages where sequences behave similarly? The best  
analogy that comes to mind is sparse files, but in that case there is  
an implicit contract that the intervening empty regions do not  
actually occupy physical space, doesn't sound like that's true here.


I think the result of this change would be more difficult debugging  
of off-by-one errors and their ilk, rather than actually being a real  
benefit.


OTOH, perhaps there is a real use-case I am missing here. I don't see  
the rest of this thread on GENERAL and I couldn't find it searching  
the archives, where did it come from?


-Casey


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Two efficiency questions - clustering and ints

2006-10-06 Thread Casey Duncan


On Oct 6, 2006, at 11:12 AM, John D. Burger wrote:


Richard Huxton wrote:

Should I always cluster the tables?  That is, even if no column  
jumps out as being involved in most queries, should I pick a  
likely one and cluster on it?


Well you cluster on an index, and if you don't think the index is  
useful, I'd drop it. If you have an index, clustering isn't  
necessarily going to help you unless you regularly read a series  
of rows in order.


Right, but clustering on that index means an index scan (with that  
index) will be faster.  This is uninteresting if the index doesn't  
get used, but is there any =downside= to doing it?


Here's a simpler question - for static data, should I always  
cluster on the index I think will be used the most?


Depends on the queries. If the index is on a foreign key value where  
there may be many rows with the same key scattered about it will help  
queries that lookup using that foreign key. Clustering on a column  
with high cardinality isn't much of a win typically for single key  
lookups (depends on the lookup pattern), but would be for ranges and  
possibly for sorting on that column.


It also depends on the size of the table and indices. If they are  
small enough to fit in memory then clustering to reduce random access  
isn't really helpful.


I would suggest doing some timing tests on typical queries with the  
data unclustered and clustered to know what you are gaining.


-Casey

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] how to get number of minutes?

2006-10-06 Thread Casey Duncan

select extract(epoch from interval '2 hours')/60;

'epoch' returns the number epoch seconds that comprise the interval.  
That differs from 'seconds' which just returns the "seconds place",  
which is zero for 2:00:00 of course.


-Casey

On Oct 6, 2006, at 12:22 PM, Chris Hoover wrote:

If I subtract 2 timestamps, how do I get the results returned as  
the total number of minutes.


example

now() - (now()-'2 hours'::interval) = 2:00:00  -- I need the result  
returned as 120 minutes.


Thanks,

Chris



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] COPY FROM STDIN instead of INSERT

2006-10-18 Thread Casey Duncan


On Oct 18, 2006, at 5:20 AM, Ilja Golshtein wrote:


When starting a database from scratch it is much faster to import the
data and then create the indexes. The time to create index on a full
table is less than the extra time from each index update from the
inserts. The more indexes to update the more time updating indexes  
takes.


The problem with a live database is removing the indexes slows down
current users and if you are adding 2,000 rows to a table that  
already

has 5,000,000 rows in it then you will loose the benefit.


I am 100% agree with you. What you are describing is a very good  
and useful technique for some maintenance operations.


My current goal is to increase performance in normal [almost ;)]  
OLTP mode of my application, so removing indexes for some time is  
not an option here.


And my question remains.
Is it Ok to use COPY BINARY FROM STDIN instead of multiple INSERTs?


I don't think I would use BINARY, it seems likely to be susceptible  
to changes in the underlying data type storage. From the docs:


"To determine the appropriate binary format for the actual tuple data  
you should consult the PostgreSQL source, in particular the *send and  
*recv functions for each column's data type (typically these  
functions are found in the src/backend/utils/adt/ directory of the  
source distribution)."


Regular text COPY is quite a big performance win over INSERTs, and  
doesn't require your application to know the vagaries of the data  
storage. Also, if you have many indices, time to update them will  
probably dominate anyhow, making the difference between binary and  
text copy negligible.


A was mentioned, COPY can only insert static data, and does not  
support rules (that you might use to support constraint exclusion,  
etc). AFIAK, the locking semantics are the same as INSERT, i.e., it  
does not lock the entire table or anything like that. Since it would  
allow transactions that insert data to finish faster, it should  
actually work better under high concurrency.


-Casey

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Help needed

2006-10-19 Thread Casey Duncan
On Oct 19, 2006, at 8:19 AM, <[EMAIL PROTECTED]>  
<[EMAIL PROTECTED]> wrote:



I want to upgrade a system from 7.3.8 ro 8.1.5

I am new to Psql and looking for handy hints

Any known problems or pit-falls ?


You'll need to dump the database and reload (pg_dump and pg_restore),  
8.1 uses a different data format.


There can be issues with character encodings. IIRC 8.1 is much more  
strict about what it allows than 7.4 was. So you could potentially  
run into errors importing the data into 8.1 fro 7.x.


I would keep the 7.4 database around after dumping it to correct any  
possible encoding glitches.


There are also quite a few query planner changes between these  
versions. They are certainly improvements, but they can mean that  
queries that work correctly on 7.4 that fail to use the same indices  
or just behave differently on 8.1. Most of the time things will just  
work (and work better), but you'll want to watch out for that.



Anything we need to do in preparation ?


I would just do a dry run. Dump the 7.4 database and install 8.1.5  
somewhere (perhaps in a local directory or on another machine). Then  
load the dump into it and run the ANALYZE sql command.


If it loads successfully, test out your applications against it. Test  
it and test it some more.


8.1 is typically configured differently than 7.4 wrt things like  
shared_buffers and whatnot. You'll definitely want to adjust the  
defaults for things (upwards) and you'll probably want different  
settings than before even running on the same hardware.


You'll probably also want to turn autovacuum on, and possible  
vacuum_cost_delay. This page describes a lot of the settings with  
recommendations (even though it's getting pretty dated now, the info  
is still useful):


http://www.powerpostgresql.com/Downloads/annotated_conf_80.html

hth,

-Casey


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] AutoVacuum on demand?

2006-11-13 Thread Casey Duncan

On Nov 13, 2006, at 1:05 PM, Glen Parker wrote:


Matthew T. O'Connor wrote:

Glen Parker wrote:
I would like a way to run the autovacuum daemon on demand  
periodically.   Every night at 2 AM, for example.


Anybody know if this is possible?  If not, it's a feature  
request :-)
Autovacuum can be enabled / disabled on the fly using the GUC  
settings.  Perhaps you can write a cron script that runs at 2AM  
that enables autovac, and at 6AM (just a guess) disables it in the  
same fashion.  I think that should work.

Would be curious to hear your results.


I thought it sounded like a good idea, but then...

SET autovacuum = on;
ERROR:  parameter "autovacuum" cannot be changed now


I've turned it on and off in the postgresql.conf several times  
(followed by a postmaster HUP) and it worked as expected. I think  
this is a global setting that cannot be toggled in a particular session.


-Casey

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] statement_timeout

2006-11-16 Thread Casey Duncan
When I configure statement_timeout globally, I typically override it  
for superusers and other accounts used by dbas. Just issue:


  ALTER USER postgres SET statement_timeout = 0;

Repeat for other superusers (slony, etc). Then the policy won't apply  
to them.


-Casey

On Nov 16, 2006, at 6:46 AM, Marcin Mańk wrote:


Hello.
I have an unconfirmed feeling that autovac does not like system-wide
statement_timeout. I.e. when I in some panic move set system-wide
statement_timeout to 90 seconds, autovac stopped working (I do not  
know for

100% if there is a dependency).

Ups... Now I checked that pg_dump has the same issue. and vacuumdb.  
This is

all on 8.1.4 .

Yeah, system-wide statement_timeout is not much of a brilliant idea :(

Pozdrawiam
Marcin Mańk


---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] n00b RAID + wal hot standby question

2006-12-04 Thread Casey Duncan

On Dec 4, 2006, at 1:11 PM, Anton Melser wrote:


Hi,
I am just starting at a company and we are inheriting a previously
built solution. It looks pretty good but my previous experience with
pg is seriously small-time compared with this...
I am very new at the job, and don't know what hd config we have but it
will be RAID-something I imagine (hey I was working with desktop
"servers" before this!). If that is very important I can find out. We
seem to be saving our WAL to the same partition as PGDATA, and I
notice that we are maxing out a reasonable looking server. The db is
not very big (~4gig, 400meg pgdump), and though I can't see any vacuum
strategy (autovacuum on a 8.1.4 is disabled), we didn't have as much
consistent CPU usage at my old job (with a 6 gig db and MUCH less CPU
and RAM, and probably as many connections), and my vacuum strategy was
also pitiful!  Sure, completely different environments, but I am
thinking that WAL replication could be a factor.
So my question... being in complete ignorance of how RAID works (the
performance details)... would it be better to try and separate the WAL
destination from PGDATA? How much of a difference could it make?
Should we wait till the customer starts complaining (no explosion in
traffic/db size realistic for the foreseeable future...)?
Any abuse welcome.


When you say maxing out, what do you mean? Posting some vmstat output  
under load would be very helpful (assuming *nix, otherwise can't help  
you ;^).


My very general RAID recommendation would be:

Put the wal on a 2 disk RAID-1 array either by itself or together  
with the system if you expect little disk activity from the rest of  
the system.


Put the data on a RAID-10 array (at least 4 disks, but more would be  
much better).


As for the hardware itself, we've had good luck with 10k rpm WD  
raptor SATA drives + 3ware 95xx raid controllers (the 9550SX being  
preferred due to their smarter caching and higher bandwidth).


THe most important thing though is too keep as much of the database  
in RAM as possible. That means putting as much RAM in the database  
box as you can afford and not sharing it with other processes (i.e.,  
move other RAM-hungry things to another box). And configure postgres  
to use the available RAM by tuning shared_buffers, work_mem, etc.


Tuning checkpoint_segments and wal_buffers can help with write  
performance.


And running autovacuum (or regular full database vacuums) is very  
important to the performance of the database over time. Otherwise you  
will definitely notice significant performance degradation as the  
garbage in the tables mounts (unless of course the db is read-only).  
If it's been off for a long time (or never run), you'll probably need  
to spend some quality time doing a "VACUUM FULL" and possibly a  
"REINDEX DATABASE" first.


-Casey

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] [PERFORM] Locking in PostgreSQL?

2006-12-06 Thread Casey Duncan

On Dec 5, 2006, at 11:04 PM, Joost Kraaijeveld wrote:


Does PostgreSQL lock the entire row in a table if I update only 1
column?


Know that updating 1 column is actually updating the whole row. So if  
one transaction updates column A of a row, it will block another  
concurrent transaction that tries to update column B of the same row.  
As was mentioned however, neither of these transactions block others  
reading the row in question, though they see the row as it existed  
before the updates until those update transactions commit.


If you know that your application will suffer excessive update  
contention trying to update different columns of the same row, you  
could consider splitting the columns into separate tables. This is an  
optimization to favor write contention over read performance (since  
you would likely need to join the tables when selecting) and I  
wouldn't do it speculatively. I'd only do it if profiling the  
application demonstrated significantly better performance with two  
tables.


-Casey

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Anything I can do to speed up this query?

2006-12-06 Thread Casey Duncan

On Dec 6, 2006, at 2:34 PM, Wei Weng wrote:


On Tue, 2006-12-05 at 15:56 -0500, Wei Weng wrote:

I have a table that has roughly 200,000 entries and many columns.

The query is very simple:

SELECT Field1, Field2, Field3... FieldN FROM TargetTable;

TargetTable has an index that is Field1.



I think I have discovered the reason for why the query runs so  
slow. The
original query has an ORDER BY Field1 clause that I forgot to put  
in my

email.

So the query looks like this:

SELECT Field1, Field2, Field3... FieldN FROM TargetTable ORDER BY  
Field1

DESC;

What is the effective way to optimize this query(or to optimize the
system) to run a little faster than it does now?

Thanks and I really appreciate all the helps I've gotten so far.


clustering the table on the index used for the sort might help, but  
likely performance of the above is probably worse than your original  
example due to the added random access overhead caused by the index  
scan (assuming the sort uses an index). If the table changes  
infrequently, you could consider creating an in-order copy of the  
data (using INSERT INTO) so that you don't need to use an index to  
order it.


If you need the full list of results before the application can do  
anything (thus cursors won't help), then you'll either need to reduce  
the amount of data scanned and returned (optimize fields for byte  
size, move any fields not needed by this query to another table,  
reduce the number of rows in the table, etc) or scan it faster  
(faster disks and cpu).


If you have multiple cpus on the database side, and you are not I/O  
bound, you could consider using table partitioning to break the table  
up into pieces where each piece contains a fraction of the whole (say  
one quarter). Then issue four concurrent queries from the application  
and reassemble the results on that end. You might not need to use  
table partitioning if you can efficiently determine the "cleave"  
points at run time. Then you would just use use a where clause to  
select the proper range.


-Casey



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] replication advice needed

2007-01-10 Thread Casey Duncan
You could use slony for this a couple of ways. One is a simpler more  
hacky way, another is a bit more involved but perhaps more "correct":


For the simple way, setup the source database as a provider and the  
remote replica as a normal slony subscriber.Don't run the slon  
daemons all the time, just launch them and let them run every night  
to catch up the replica. Newer slony versions allow you to run slon  
in a mode where it terminates after a certain point. This is simple,  
but has a couple of drawbacks:


- The data will become bloated in the source during the day, because  
each insert/update to a replicated table will create log data in  
ancillary slony tables that won't get cleaned out until you replicate  
at night. This is probably a pretty minor issue though unless the  
data it huge.
- The replication process will put appreciable load on the source  
database. And as the log tables grow it takes longer and longer for  
the replica to catch up.


Another approach is perhaps more "standard" though more complex:
- Setup a slony provider and subscriber both on the local network  
(you need at least one secondary to use log shipping)

- Setup a remote secondary as a "log shipping" node
- Run slons for the local provider and subscriber with "log shipping"  
mode turned on. This will generate sql files that contain the data  
updates made to the provider.
- Periodically tar up the files, ship them over to the remote  
secondary and apply them (using psql or some custom script).


An advantage to the latter is that you can compress the log shipping  
files. They are not very compact (since they contain sql statements  
and your data), but they should compress quite well.


hth,

-Casey


On Jan 10, 2007, at 11:02 AM, Gene wrote:


I searched the web sites for advice but came up short on something
that looked like what I need. I need to "replicate" a few tables
nightly (2am-6am). Each table will have about 1 million new records
(~100B each, ~5/second) each day.

I have tried using something like a cron job (copy from/copy to) but
it has limitiations as far as handling duplicate keys (just need to
replace) and it will not fill in gaps if there happen to be any. the
servers are also far from each other and only have a 50 KB/s ipsec
link between them so data transfer needs to be somewhat minimized.

thanks for any suggestions,
gene

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] Where art thou pg_clog?

2007-02-15 Thread Casey Duncan
We have a production system with multiple identical database  
instances on the same hardware, with the same configuration, running  
databases with the exact same schema. They each have different data,  
but the database sizes and load patterns are almost exactly the same.


We are running pg 8.1.5 (upgraded the day before 8.1.6 came out, oh  
well ;^) and since then we have noticed the following error on two of  
the servers:


2007-02-15 00:35:03.324 PST ERROR:  could not access status of  
transaction 2565134864
2007-02-15 00:35:03.325 PST DETAIL:  could not open file "pg_clog/ 
098E": No such file or directory


The first time this happened, I chalked it up to some kind of disk  
corruption based on the mailing list archives. So I dumped the  
databases, did a fresh initdb, forced an fsck (these run with a jfs  
data partition and an ext2 wal partition) which found no problems and  
then reloaded the databases.


Now about a week later I see the same problem on different server. We  
never saw this problem running 8.1.3 on these same machines over many  
months, so I'm beginning to get suspect that something we changed  
since running 8.1.3 is to blame. Before the upgrade these systems ran  
postgres 8.1.3 and slony 1.1.5. Now they run postgres 8.1.5 and slony  
1.2.6 (I don't know that the slony version is important, I add it  
here for completeness). Nothing else important has changed on these  
boxes. I see the 8.1.8 is out now, though nothing I see in the  
release notes seems relevant to this issue.


Here are some specific things I'd like to know:

1. Is it possible to "fix" this problem without an dumpall/initdb/ 
restore. That takes many hours and can only be done when I'm supposed  
to be at home relaxing (yeah right) ;^) FWIW, the system is  
functioning fine right now from what I can tell, save the above  
errors in the log every few minutes.


2. What more info can I give to figure out the "cause" of this. Are  
there files I can inspect to find out more?


3. Is it possible that this is a side-affect of the upgrade to 8.1.5?

Thanks for any insights,

-Casey

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Where art thou pg_clog?

2007-02-15 Thread Casey Duncan

On Feb 15, 2007, at 1:50 PM, Peter Eisentraut wrote:


Casey Duncan wrote:

2007-02-15 00:35:03.324 PST ERROR:  could not access status of
transaction 2565134864
2007-02-15 00:35:03.325 PST DETAIL:  could not open file "pg_clog/
098E": No such file or directory

The first time this happened, I chalked it up to some kind of disk
corruption based on the mailing list archives. So I dumped the
databases, did a fresh initdb, forced an fsck (these run with a jfs
data partition and an ext2 wal partition) which found no problems and
then reloaded the databases.

Now about a week later


Unless you actually executed 2565134864 transactions in that one week,
this is still data corruption.

Check for faulty memory.


I'd be more inclined to agree with you if it happened on only one  
server machine. But this has now happened on two different machines  
in the space of a week.


My understanding is that the transaction id logged is garbage because  
the bookkeeping fields have been clobbered for some tuple(s). The one  
last week was really low (like < 1000).


-Casey

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Where art thou pg_clog?

2007-02-15 Thread Casey Duncan


On Feb 15, 2007, at 1:46 PM, Alvaro Herrera wrote:


Casey Duncan wrote:

We have a production system with multiple identical database
instances on the same hardware, with the same configuration, running
databases with the exact same schema. They each have different data,
but the database sizes and load patterns are almost exactly the same.

We are running pg 8.1.5 (upgraded the day before 8.1.6 came out, oh
well ;^) and since then we have noticed the following error on two of
the servers:

2007-02-15 00:35:03.324 PST ERROR:  could not access status of
transaction 2565134864
2007-02-15 00:35:03.325 PST DETAIL:  could not open file "pg_clog/
098E": No such file or directory


Can you relate it to autovacuum?


Maybe. Here's what I get when I crank up the logging to debug4:

2007-02-15 14:20:48.771 PST DEBUG:  StartTransaction
2007-02-15 14:20:48.771 PST DEBUG:  name: unnamed; blockState:
DEFAULT; state: INPROGR, xid/subid/cid: 3429052708/1/0, nestlvl: 1,  
children: <>

2007-02-15 14:20:48.771 PST DEBUG:  vacuuming "pg_catalog.pg_statistic"
2007-02-15 14:20:48.771 PST ERROR:  could not access status of  
transaction 2565134864
2007-02-15 14:20:48.772 PST DETAIL:  could not open file "pg_clog/ 
098E": No such file or directory

2007-02-15 14:20:48.772 PST DEBUG:  proc_exit(0)
2007-02-15 14:20:48.772 PST DEBUG:  shmem_exit(0)
2007-02-15 14:20:48.773 PST DEBUG:  exit(0)
2007-02-15 14:20:48.775 PST DEBUG:  reaping dead processes

does that imply that it is the pg_statistic table that is hosed?

Interestingly I can manually vacuum that table in all of the  
databases on this machine without provoking the error.


-Casey


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Where art thou pg_clog?

2007-02-15 Thread Casey Duncan


On Feb 15, 2007, at 2:44 PM, Alvaro Herrera wrote:


Casey Duncan wrote:


On Feb 15, 2007, at 1:46 PM, Alvaro Herrera wrote:

[..]
Can you relate it to autovacuum?


Maybe. Here's what I get when I crank up the logging to debug4:

2007-02-15 14:20:48.771 PST DEBUG:  StartTransaction
2007-02-15 14:20:48.771 PST DEBUG:  name: unnamed; blockState:
DEFAULT; state: INPROGR, xid/subid/cid: 3429052708/1/0, nestlvl: 1,
children: <>
2007-02-15 14:20:48.771 PST DEBUG:  vacuuming  
"pg_catalog.pg_statistic"

2007-02-15 14:20:48.771 PST ERROR:  could not access status of
transaction 2565134864
2007-02-15 14:20:48.772 PST DETAIL:  could not open file "pg_clog/
098E": No such file or directory
2007-02-15 14:20:48.772 PST DEBUG:  proc_exit(0)
2007-02-15 14:20:48.772 PST DEBUG:  shmem_exit(0)
2007-02-15 14:20:48.773 PST DEBUG:  exit(0)
2007-02-15 14:20:48.775 PST DEBUG:  reaping dead processes

does that imply that it is the pg_statistic table that is hosed?

Interestingly I can manually vacuum that table in all of the
databases on this machine without provoking the error.


Except template0 I presume?  Is this autovacuum running in template0
perchance?  I note that 800 million transactions have passed since the
Xid in the error message was current.


Wouldn't you know it! A little farther back up in the log file:

2007-02-15 14:20:48.480 PST LOG:  autovacuum: processing database  
"template0"

2007-02-15 14:20:48.480 PST DEBUG:  StartTransaction
2007-02-15 14:20:48.480 PST DEBUG:  name: unnamed; blockState:
DEFAULT; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1,  
children: <>
2007-02-15 14:20:48.481 PST DEBUG:  autovacuum: VACUUM FREEZE whole  
database

2007-02-15 14:20:48.481 PST DEBUG:  CommitTransaction
2007-02-15 14:20:48.481 PST DEBUG:  name: unnamed; blockState:
STARTED; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1,  
children: <>


fwiw, I did a cluster-wide vacuum on 1/20/2007. Not sure if that has  
any impact on anything, just thought I'd throw it out there.


I'm curious how template0 got stomped on. Certainly nothing's been  
changing it. Of course it might just be some random bug so the fact  
it landed on a file for template0 could be completely arbitrary.  
Anyhow it does seem curious to me.


-Casey

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Where art thou pg_clog?

2007-02-15 Thread Casey Duncan


On Feb 15, 2007, at 5:21 PM, Alvaro Herrera wrote:


Casey Duncan wrote:


Interestingly I can manually vacuum that table in all of the
databases on this machine without provoking the error.


Except template0 I presume?  Is this autovacuum running in template0
perchance?  I note that 800 million transactions have passed  
since the

Xid in the error message was current.


Wouldn't you know it! A little farther back up in the log file:

2007-02-15 14:20:48.480 PST LOG:  autovacuum: processing database
"template0"
2007-02-15 14:20:48.480 PST DEBUG:  StartTransaction
2007-02-15 14:20:48.480 PST DEBUG:  name: unnamed; blockState:
DEFAULT; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1,
children: <>
2007-02-15 14:20:48.481 PST DEBUG:  autovacuum: VACUUM FREEZE whole
database
2007-02-15 14:20:48.481 PST DEBUG:  CommitTransaction
2007-02-15 14:20:48.481 PST DEBUG:  name: unnamed; blockState:
STARTED; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1,
children: <>


This is a bug we fixed in 8.1.7.  I suggest you update to the  
latest of

the 8.1 series, to get that fix among others.


ok, great.


To fix the problem, set pg_database.datallowconn=true for template0,
then connect to it and do a VACUUM FREEZE.  Then set  
datallowconn=false

again.


Do you mean to do this after upgrading to 8.1.8? If I try than in  
8.1.5, I get (unsurprisingly):


% psql -U postgres template0 -c "vacuum freeze"
ERROR:  could not access status of transaction 2565134864
DETAIL:  could not open file "pg_clog/098E": No such file or directory




I'm curious how template0 got stomped on. Certainly nothing's been
changing it. Of course it might just be some random bug so the fact
it landed on a file for template0 could be completely arbitrary.


The problem is that all databases are vacuumed every so many
transactions, to avoid Xid wraparound problems; even non connectable
databases.  The problem is that a bug in autovacuum caused that vacuum
operation to neglect using the FREEZE flag; this negligence makes it
leave non-permanent Xids in the tables, leading to the problem you're
seeing.


Ironically we were earlier bitten by the bug that autovacuum didn't  
do the cluster-wide vacuum until too late. Now we got bitten by the  
fact that did do the cluster-wide vacuum. Talk about damned-if-you-do- 
and-damned-if-you-don't! 8^)


ok, this is a much better sounding explanation than "random data  
corruption" ;^)


Thanks!

-Casey

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Where art thou pg_clog?

2007-02-16 Thread Casey Duncan


On Feb 15, 2007, at 5:50 PM, Alvaro Herrera wrote:


Casey Duncan wrote:


On Feb 15, 2007, at 5:21 PM, Alvaro Herrera wrote:


Casey Duncan wrote:



To fix the problem, set pg_database.datallowconn=true for template0,
then connect to it and do a VACUUM FREEZE.  Then set
datallowconn=false
again.


Do you mean to do this after upgrading to 8.1.8? If I try than in
8.1.5, I get (unsurprisingly):

% psql -U postgres template0 -c "vacuum freeze"
ERROR:  could not access status of transaction 2565134864
DETAIL:  could not open file "pg_clog/098E": No such file or  
directory


Hum, yeah, I forgot to mention that you need to create the 098E  
pg_clog

segment for that to work at all :-)  Fill it with byte 0x55 till the
needed position, which is the bit pattern for "all transactions
committed".  I'd make sure to remove it manually after the freeze is
done, just in case!  (I think the system would remove it at next
checkpoint, but anyway.)


That seems a bit scary to do on a running production server. Could I  
get away with dropping the template0 database and loading one from  
another identical pg instance (or a new one) or will that freak  
things out?


-Casey

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Large aggregate query running out of memory in ExecutorState

2007-03-02 Thread Casey Duncan
I have some nightly statisics queries that runs against a view which  
unions several large tables. Recently one of these queries started  
running into out of memory errors. This is on postgresql 8.1.8  
running on 32-bit Debian Linux.


Here is the error in the log including the query (excluding memory  
detail):


2007-03-02 05:20:46.302 PST [d:reports u:hits s:45e81fd1.635a 3]  
ERROR:  out of memory
2007-03-02 05:20:46.302 PST [d:reports u:hits s:45e81fd1.635a 4]  
DETAIL:  Failed on request of size 16.
2007-03-02 05:20:46.302 PST [d:reports u:hits s:45e81fd1.635a 5]  
STATEMENT:  select count(case when is_positive then true end),

count(case when not is_positive then true end),
count(distinct st_id)",
count(distinct st_id & X''::bigint)
from fb_packed;

Here is the query plan for the above:

   QUERY PLAN
 
-

Aggregate  (cost=20393102.13..20393102.14 rows=1 width=9)
   ->  Append  (cost=0.00..12015096.06 rows=418900303 width=45)
 ->  Seq Scan on fb_packed  (cost=0.00..20.30 rows=1030  
width=45)
 ->  Seq Scan on fb_packed  (cost=0.00..1411898.37  
rows=75468337 width=45)
 ->  Seq Scan on fb_packed  (cost=0.00..1675027.88  
rows=89690588 width=45)
 ->  Seq Scan on fb_packed  (cost=0.00..1553071.43  
rows=83061343 width=45)
 ->  Seq Scan on fb_packed  (cost=0.00..1632686.20  
rows=87618920 width=45)
 ->  Seq Scan on fb_packed  (cost=0.00..1553388.85  
rows=83060085 width=45)

(8 rows)

Not very complicated at all, just lots of rows to process. The  
estimates for row quantity and width are roughly correct above.


Here is the relevant bits from the memory detail in the logs. Notice  
the first ExecutorState value, which doesn't seem reasonable:


TopMemoryContext: 8241296 total in 12 blocks; 5528 free (13 chunks);  
8235768 used
TopTransactionContext: 8192 total in 1 blocks; 7856 free (0 chunks);  
336 used
Local Buffer Lookup Table: 57344 total in 3 blocks; 23096 free (7  
chunks); 34248 used
Type information cache: 8192 total in 1 blocks; 1864 free (0 chunks);  
6328 used
Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks);  
3256 used
MessageContext: 122880 total in 4 blocks; 59280 free (3 chunks);  
63600 used
smgr relation table: 24576 total in 2 blocks; 9952 free (3 chunks);  
14624 used

Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 848 free (0 chunks); 176 used
ExecutorState: -1525554452 total in 337 blocks; 1476423576 free  
(20505908 chunks); 1292989268 used

ExecutorState: 24576 total in 2 blocks; 15560 free (3 chunks); 9016 used
[...Remainder elided]

This just started recently, with no changes to the query itself.  
Seems like we just crossed some size threshold that triggered this  
condition. I tried analyzing all of the tables again just in case,  
but it didn't help.


TIA,

-Casey

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Unit testing plpgsql code

2004-11-20 Thread Casey Duncan
Anyone know of a unit testing framework for plpgsql stored procedures?
We are about to convert a bunch of stored procedures from Oracle over
to Postgresql. We currently use the utPLSQL package to unit test
those, so something comparable to it would be optimal.

We'll probably roll our own if nothing currently exists, but I figured
it was worth it to ask first before reinventing the wheel.

TIA,

-Casey

---(end of broadcast)---
TIP 8: explain analyze is your friend


[GENERAL] Possible infinite loop in query using bitmap scans

2006-03-13 Thread Casey Duncan
I have this report query that runs daily on a table with several  
hundred million rows total using pg 8.1.3 on Debian Linux on hw with  
dual opteron processors:


 SELECT count(*) FROM webhits
   WHERE path LIKE '/radio/tuner_%.swf' AND status = 200
   AND date_recorded >= '3/10/2006'::TIMESTAMP
   AND date_recorded < '3/11/2006'::TIMESTAMP;

Here is the explain output:

 
QUERY PLAN
 
 
--

 Aggregate  (cost=794775.08..794775.09 rows=1 width=0)
   ->  Bitmap Heap Scan on webhits  (cost=315820.45..794771.74  
rows=1337 width=0)
 Recheck Cond: ((date_recorded >= '2006-03-10  
00:00:00'::timestamp without time zone) AND (date_recorded <  
'2006-03-11 00:00:00'::timestamp without time zone))
 Filter: (((path)::text ~~ '/radio/tuner_%.swf'::text) AND  
(status = 200))

 ->  BitmapAnd  (cost=315820.45..315820.45 rows=249152 width=0)
   ->  Bitmap Index Scan on webhits_date_idx1   
(cost=0.00..140407.45 rows=15379741 width=0)
 Index Cond: ((date_recorded >= '2006-03-10  
00:00:00'::timestamp without time zone) AND (date_recorded <  
'2006-03-11 00:00:00'::timestamp without time zone))
   ->  Bitmap Index Scan on webhits_path_idx2   
(cost=0.00..175412.76 rows=15343959 width=0)
 Index Cond: (((path)::text >=  
'/radio/tuner'::character varying) AND ((path)::text <  
'/radio/tunes'::character varying))


According to the planner it should take <15 minutes which is typical in  
practice. About half the times it runs, however, it never terminates  
(even after days) and just spins consuming 99+% of CPU with no disk  
activity. This query was never a problem in postgres versions < 8.1.2,  
however the data has grown substantially since that time. I notice it  
uses the recent in-memory bitmap feature, so I wondered if it was  
exposing a bug.


If I restart the postmaster, the query will complete in the expected  
time.


-Casey


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Possible infinite loop in query using bitmap scans

2006-03-13 Thread Casey Duncan


On Mar 13, 2006, at 9:50 AM, Michael Fuhr wrote:


On Sun, Mar 12, 2006 at 11:36:23PM -0800, Casey Duncan wrote:

 SELECT count(*) FROM webhits
   WHERE path LIKE '/radio/tuner_%.swf' AND status = 200
   AND date_recorded >= '3/10/2006'::TIMESTAMP
   AND date_recorded < '3/11/2006'::TIMESTAMP;

[...]

 Aggregate  (cost=794775.08..794775.09 rows=1 width=0)

[...]
According to the planner it should take <15 minutes which is typical  
in

practice.


The planner's cost estimate is in units of disk page fetches, not
time.  The above estimate isn't 794775.09 ms (~13.25 min) but rather
794775.09 times the cost of a single page fetch, however much that
is.  See "Using EXPLAIN" in the "Performance Tips" chapter of the
documentation.

http://www.postgresql.org/docs/8.1/interactive/performance- 
tips.html#USING-EXPLAIN


Doh! I should've known that.


About half the times it runs, however, it never terminates
(even after days) and just spins consuming 99+% of CPU with no disk
activity. This query was never a problem in postgres versions < 8.1.2,
however the data has grown substantially since that time. I notice it
uses the recent in-memory bitmap feature, so I wondered if it was
exposing a bug.


If the problem happens half the time then you have a somewhat
repeatable test case.  Do you get more consistent performance if
you set enable_bitmapscan to off?  What's the query plan if you do
that?


Here's the plan with bitmap scans off:

 
  QUERY PLAN
 
 
--

 Aggregate  (cost=891363.71..891363.72 rows=1 width=0)
   ->  Index Scan using webhits_date_idx1 on webhits   
(cost=0.00..891360.30 rows=1362 width=0)
 Index Cond: ((date_recorded >= '2006-03-10  
00:00:00'::timestamp without time zone) AND (date_recorded <  
'2006-03-11 00:00:00'::timestamp without time zone))
 Filter: (((path)::text ~~ '/radio/tuner_%.swf'::text) AND  
(status = 200))

(4 rows)

The query runs to completion this way in about 40 minutes. I turned  
bitmap scans back on and it hangs again (I ran it for about 5 hours).



If you narrow the search criteria so the query returns fewer rows,
do you still see the problem?  Can you identify a "sour spot" where
the problem starts to happen?


I'll do that tomorrow, and let you know.


If I restart the postmaster, the query will complete in the expected
time.


Does the problem eventually start happening again?  If so, after
how long?  How did you determine that the restart is relevant?  Do
you consistently see different (presumably better) performance after
a restart than if you don't restart?


This is a production box, so I can't restart it whenever I want and I  
haven't yet reproduced it elsewhere -- the data base size makes that  
cumbersome at best -- but once after it hung up, I restarted postgres  
and the report ran to completion for a couple of days then started  
hanging again. Today it seems to pretty consistently hang, I'll see if  
I can restart it overnight and test it again.


-Casey


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] Possible infinite loop in query using bitmap scans

2006-03-14 Thread Casey Duncan

On Mar 13, 2006, at 5:25 PM, Casey Duncan wrote:
[..]

If I restart the postmaster, the query will complete in the expected
time.


Does the problem eventually start happening again?  If so, after
how long?  How did you determine that the restart is relevant?  Do
you consistently see different (presumably better) performance after
a restart than if you don't restart?


I restarted postgres this morning and this time it didn't seem to help. 
That query has been running for several hours now. I'm going to let it 
go a while longer to see if it eventually completes, but I suspect it 
won't. So perhaps the prior restart was just dumb luck.


I'll try some narrower queries as well.

-Casey


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Possible infinite loop in query using bitmap scans

2006-03-14 Thread Casey Duncan

On Mar 14, 2006, at 1:31 PM, Casey Duncan wrote:

On Mar 13, 2006, at 5:25 PM, Casey Duncan wrote:
[..]

If I restart the postmaster, the query will complete in the expected
time.


Does the problem eventually start happening again?  If so, after
how long?  How did you determine that the restart is relevant?  Do
you consistently see different (presumably better) performance after
a restart than if you don't restart?


I restarted postgres this morning and this time it didn't seem to 
help. That query has been running for several hours now. I'm going to 
let it go a while longer to see if it eventually completes, but I 
suspect it won't. So perhaps the prior restart was just dumb luck.


I'll try some narrower queries as well.


Interestingly, I can only narrow the query by about an hour (i.e., a 23 
hour span) before the plan changes to a simple index scan. That may 
explain why this only started hanging recently since it probably only 
started using the bitmap scan for a day interval recently.


Let me know if there is anything I should do or any info you need to 
further pin this down.


In the mean time I will disable bitmap scans for this query.

Thanks.

-Casey


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] What is your favorite front end for user interaction to postgresql databases?

2006-05-08 Thread Casey Duncan

On May 8, 2006, at 3:33 PM, Karen Hill wrote:


What is your favorite front end for end users to interact with your
postgresql db?  Is it java, .net, web apache + php, MS-Access, ruby on
rails?  Why is it your favorite?  Which would you recommend for end
users on multiple OSes?


This is totally dependent on the individual user's preference:

- Many users comfortable with command line shells prefer psql (this  
is my choice).

- Others who like a gui might use pgAdmin (or miriad others)
- If interoperability with MSOffice is your main concern, I could see  
Access/ODBC being useful. As a general front-end, probably not.


When you start talking about java v. .net v. php vs. ruby, that's a  
whole other ball of wax. That's a lot less about interaction, or even  
about databases and a lot more about programmer preference.  
Personally, I would recommend:


Python + psycopg (highly interactive, great for complex scripts or  
even apps)
Java + JDBC + Hibernate (I think JDBC sucks, but Hibernate totally  
rocks and more or less hides it from you)
Shell + psql (great for simple reporting, automating dba tasks, etc.  
etc)


Also, what do you think of having the database management system do  
all

work (business logic etc) with the front end as a user interface vs.
having N-tier with an application server handling business logic, and
the db just accepting data.  I currently have the opinion that N-tier
is not as good as having the db contain the business logic via stored
procedures.  This comes from having to re-invent the wheel every  
time a

new user app needs to  be created.


From personal experience (and others will disagree), I find putting  
logic in the database to be a bad idea. I only use stored procedures  
for triggers and the like, and I try to avoid those whenever possible.


Here are my reasons why:

- I don't like the PL/pgSQL language (yes there are alternatives, but  
they have their own drawbacks)
- It's complex to test and upgrade (we actually wrote non-trivial  
infrastructure to automate both)
- It's difficult to debug (compared to external languages like python  
or java)
- It's difficult to profile, therefore getting good performance can  
be difficult


I had a very complex system coded in stored procedures that performed  
poorly and was hard to maintain. It's now refactored into java/ 
hibernate code that's simpler to understand, performs much better and  
is easy to extend and maintain. Of course that's just my particular  
case and obviously YMMV.


Stored procs could make a lot of sense if you have many different  
clients accessing the db in different ways and you want to strictly  
enforce business rules across all of them. I had no such requirements  
in my case.


In any case I would strongly recommend doing the simplest thing that  
you can get away with. If your business rules can be fulfilled with  
grants, views and constraints alone, use them.


-Casey

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] autovacuum "connections" are hidden

2006-05-17 Thread Casey Duncan
Trying to drop a database, this morning I ran into the not so unusual  
error:


dropdb: database removal failed: ERROR:  database "test_seg1" is  
being accessed by other users


however, when I did "select * from pg_stat_activity" on the pg  
server, it showed no connection to that db. Then I looked at the  
processes:


tmp0% ps ax | grep test_seg1
10317 ?D  0:36 postgres: autovacuum process   test_seg1

I imagine this doesn't show up because there is no connection, per  
se, the autovacuum process is a bon-a-fide backend process in its own  
right. It's just a bit confusing in this circumstance.


I guess this is more of a heads up than a question.

Another circumstance where this has bitten me is in doing a slony  
SUBSCRIBE_SET. At least in 1.1.5 the autovac daemon can deadlock with  
slony and cause the subscriber operation to fail. When this happened  
to me it was somewhat baffling at first because I had altered  
pg_hba.conf to forbid all non-superuser connections and there were no  
other connections visible at the time (in pg_stat_activity). Turning  
off autovacuum during the subscribe fixed this, but not without a  
little head-scratching (and maybe some luck).


No way comes to my mind to improve the visiblity of autovac  
"connections" but I thought I would throw this out here in case  
anyone had any thoughts on the matter.


-Casey





---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] autovacuum "connections" are hidden

2006-05-18 Thread Casey Duncan

On May 17, 2006, at 12:34 PM, Tom Lane wrote:


Casey Duncan <[EMAIL PROTECTED]> writes:

however, when I did "select * from pg_stat_activity" on the pg
server, it showed no connection to that db. Then I looked at the
processes:
tmp0% ps ax | grep test_seg1
10317 ?D  0:36 postgres: autovacuum process   test_seg1


Hmm, autovacuum won't show up in pg_stat_activity because it never  
calls

pgstat_report_activity().  Seems like maybe it should, though.


Is this worth filing a bug about for posterity?

-Casey


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Cannot restart postgres

2006-06-19 Thread Casey Duncan
Are there any other partitions on that machine with space available?  
If so you could move some files there from your postgres data dir and  
symlink to them from their original location. At least then you might  
get it to start so you can get a pg_dump to work.


-Casey

On Jun 19, 2006, at 1:43 PM, elein wrote:


7.x installation.  (Yeah, yeah, upgrade.)

I have a disk which has only the database on it.
It has become full and postgres won't start.

Is there an option (besides restore from backup)
other than trying to move the data dir to a bigger
disk?

Would pg_resetxlog do the right thing to free up
some space so I can go in and vacuum and delete rows?

This is somewhat urgent.

Thanks for your help.

--elein
[EMAIL PROTECTED]

---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Can't start PostgreSQL

2006-07-27 Thread Casey Duncan
It seems you were running a pre-8.x postgresql version before, its  
data files are not compatible with the new version you have now.  
You'll need to find out the version that used to be installed by  
looking at the PG_VERSION file in your postgres data directory.


Once you do that you will need to install the old version (you can  
download it from postgresql.org), start it against the database  
directory and dump the databases. You can use pg_dumpall for this.


Once you have dumped the dbs, you can move the old data directory  
aside, and run initdb for the new postgresql (8.1.4) and start it.  
Then import the databases you previously dumped.


hth,

-Casey

On Jul 27, 2006, at 9:22 AM, Ian Johnson wrote:


Hi List

I want to develop an application in PostgreSQL but when starting  
the service I get the following message:


An old version of the database format was found.
You need to upgrade the data format before using PostgreSQL.
See /usr/share/doc/postgresql-8.1.4/README.rpm-dist for more  
information.


I am running Fedora Core 5 but initially had built the system with  
Fedora Core 3, with PostgreSQL installed and automatically updated  
it over the past 2 years.  I may have created a test database 2  
years ago which may be the source of the problem.  Is this the  
source of the problem?  I could find nothing in README.rpm-dist  
that helps.


I would appreciate any advice on how to deal with this problem.


Thank you


Ian Johnson



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq