[PERFORM] Configuration Advice

2007-01-17 Thread Steve

Hey there;

I've been lurking on this list awhile, and I've been working with postgres 
for a number of years so I'm not exactly new to this.  But I'm still 
having trouble getting a good balance of settings and I'd like to see what 
other people think.  We may also be willing to hire a contractor to help 
tackle this problem if anyone is interested.


I've got an application here that runs large (in terms of length -- the 
queries have a lot of conditions in them) queries that can potentially 
return millions of rows but on average probably return tens of thousands 
of rows.  It's read only for most of the day, and pretty much all the 
queries except one are really fast.


However, each night we load data from a legacy cobol system into the SQL 
system and then we summarize that data to make the reports faster.  This 
load process is intensely insert/update driven but also has a hefty 
amount of selects as well.  This load process is taking ever longer to 
complete.



SO ... our goal here is to make this load process take less time.  It 
seems the big part is building the big summary table; this big summary 
table is currently 9 million rows big.  Every night, we drop the table, 
re-create it, build the 9 million rows of data (we use COPY to put hte 
data in when it's prepared, not INSERT), and then build the indexes on it 
-- of which there are many.  Unfortunately this table gets queried 
in a lot of different ways and needs these indexes; also unfortunately, we 
have operator class indexes to support both ASC and DESC sorting on 
columns so these are for all intents and purposes duplicate but required 
under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this still 
a requirement?)


Building these indexes takes forever!  It's a long grind through inserts 
and then building the indexes takes a hefty amount of time too.  (about 9 
hours).  Now, the application is likely part at fault, and we're working 
to make it more efficient, but it has nothing to do with the index 
building time.  I'm wondering what we can do to make this better if 
anything; would it be better to leave the indexes on?  It doesn't seem to 
be.  Would it be better to use INSERTs instead of copies?  Doesn't seem to 
be.



Anyway -- ANYTHING we can do to make this go faster is appreciated :) 
Here's some vital statistics:


- Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI 
discs.  The disc configuration seems to be a good one, it's the best of 
all the ones we've tested so far.


- The load process itself takes about 6 gigs of memory, the rest is free 
for postgres because this is basically all the machine does.


- If this was your machine and situation, how would you lay out the emmory 
settings?  What would you set the FSM to?  Would you leave teh bgwriter on 
or off?  We've already got FSYNC off because "data integrity" doesn't 
matter -- this stuff is religeously backed up and we've got no problem 
reinstalling it.  Besides, in order for this machine to go down, data 
integrity of the DB is the least of the worries :)


Do wal_buffers/full_page_writes matter of FSYNC is off?  If so, what 
settings?  What about checkpoints?


Any finally, any ideas on planner constants?  Here's what I'm using:

seq_page_cost = 0.5 # measured on an arbitrary scale
random_page_cost = 1.0  # same scale as above
cpu_tuple_cost = 0.001  # same scale as above
cpu_index_tuple_cost = 0.0001   # same scale as above
cpu_operator_cost = 0.00025 # same scale as above
effective_cache_size = 679006

I really don't remember how I came up with that effective_cache_size 
number



Anyway... any advice would be appreciated :)


Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Steve


1) What is the size of the whole database? Does that fit in your memory?
  That's the first thing I'd like to know and I can't find it in your
  post.


Current on-disk size is about 51 gig.  I'm not sure if there's a different 
size I should be looking at instead, but that's what du tells me the 
directory for the database in the "base" directory is sized at.  So, no, 
it doesn't fit into memory all the way.



  I'm missing several other important values too - namely

shared_buffers
max_fsm_pages
work_mem
maintenance_work_mem



I didn't share these because they've been in flux :)  I've been 
experimenting with different values, but currently we're using:


8GB shared_buffers
10  max_fsm_pages
256MB   work_mem
6GB maintenance_work_mem


  BTW, is the autovacuum daemon running? If yes, try to stop it during
  the import (and run ANALYZE after the import of data).


No.  all vacuums are done explicitly since the database doesn't change 
during the day.  The 'order of operations' is:


- Load COBOL data into database (inserts/updates)
- VACUUM COBOL data
- Summarize COBOL data (inserts/updates with the big table using COPY)
- VACUUM summary tables

So everything gets vacuumed as soon as it's updated.


2) What is the size of a disc page? Without that we can only guess what
  doest the effective_cache_size number means - in the usual case it's
  8kB thus giving about 5.2 GiB of memory.



I believe it's 8kB.  I definitely haven't changed it :)


  As suggested in http://www.powerpostgresql.com/PerfList I'd increase
  that to about 1.400.000 which about 10.5 GiB (about 2/3 of RAM).

  Anyway - don't be afraid this breaks something. This is just an
  information for PostgreSQL how much memory the OS is probably using
  as a filesystem cache. PostgreSQL uses this to evaluate the
  probability that the page is in a cache.



Okay, I'll try the value you recommend. :)



3) What is the value of maintenance_work_mem? This is a very important
  value for CREATE INDEX (and some other). The lower this value is,
  the slower the CREATE INDEX is. So try to increase the value as much
  as you can - this could / should improve the import performance
  considerably.

  But be careful - this does influence the amount of memmory allocated
  by PostgreSQL. Being in your position I wouldn't do this in the
  postgresql.conf - I'd do that in the connection used by the import
  using SET command, ie. something like

  SET maintenance_work_mem = 524288;
  CREATE INDEX ...
  CREATE INDEX ...
  CREATE INDEX ...
  CREATE INDEX ...

  for a 512 MiB of maintenance_work_mem. Maybe even a higher value
  could be used (1 GiB?). Just try to fiddle with this a little.


	It's currently at 6GB in postgres.conf, though you have a good 
point in that maybe that should be before the indexes are made to save 
room.  Things are certainly kinda tight in the config as is.



4) Try to set up some performance monitoring - for example a 'dstat' is
  a nice way to do that. This way you can find yout where's the
  bottleneck (memory, I/O etc.)

That's basically all I can think of right now.



Thanks for the tips :)


Steve

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Steve



On Wed, 17 Jan 2007, Benjamin Minshall wrote:




Building these indexes takes forever!


Anyway -- ANYTHING we can do to make this go faster is appreciated :) 
Here's some vital statistics:


- Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI 
discs.  The disc configuration seems to be a good one, it's the best of all 
the ones we've tested so far.


What are your shared_buffers, work_mem, and maintenance_work_mem settings?

maintenance_work_mem is used for CREATE INDEX, and with 16GB of memory in the 
machine, maintenance_work_mem should be set to at least 1GB in my opinion.




shared_buffers = 8GB
work_mem = 256MB
maintenance_work_mem = 6GB

So that should be covered, unless I'm using too much memory and swapping. 
It does look like it's swapping a little, but not too badly as far as I 
can tell.  I'm thinking of dialing back everything a bit, but I'm not 
really sure what the heck to do :)  It's all guessing for me right now.



Steve

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Steve

How many rows do you typically load each night?  If it is say less than 10%
of the total rows, then perhaps the suggestion in the next paragraph is
reasonable.


	Hrm.  It's very, very variable.  I'd say it's more than 10% on 
average, and it can actually be pretty close to 50-100% on certain days. 
Our data is based upon customer submissions, and normally it's a daily 
basis kind of deal, but sometimes they'll resubmit their entire year on 
certain deadlines to make sure it's all in.  Now, we don't have to 
optimize for those deadlines, just the 'average daily load'.  It's okay if 
on those deadlines it takes forever, because that's understandable.


	However, I will look into this and see if I can figure out this 
average value.  This may be a valid idea, and I'll look some more at it.



Thanks!

Steve


SO ... our goal here is to make this load process take less time.  It

seems the big part is building the big summary table; this big summary
table is currently 9 million rows big.  Every night, we drop the table,
re-create it, build the 9 million rows of data (we use COPY to put hte
data in when it's prepared, not INSERT), and then build the indexes on it



Perhaps, placing a trigger on the source table and building a "change log"
would be useful.  For example, you could scan the change log (looking for
insert, update, and deletes) and integrate those changes into your summary
table.  Obviously if you are using complex aggregates it may not be possible
to adjust the summary table, but if you are performing simple SUM's,
COUNT's, etc.  then this is a workable solution.


--
Chad
http://www.postgresqlforums.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Steve
Would it be possible to just update the summary table, instead of recreating 
it from scratch every night?


	Hrm, I believe it's probably less work for the computer to do if 
it's rebuilt.  Any number of rows may be changed during an update, not 
including additions, so I'd have to pull out what's changed and sync it 
with what's in the summary table already.  It'll be a lot more selects and 
program-side computation to save the big copy; it might work out, but I'd 
say this would be my last ditch thing. :)


Building these indexes takes forever!  It's a long grind through inserts 
and then building the indexes takes a hefty amount of time too.  (about 9 
hours).  Now, the application is likely part at fault, and we're working to 
make it more efficient, but it has nothing to do with the index building 
time.  I'm wondering what we can do to make this better if anything; would 
it be better to leave the indexes on?  It doesn't seem to be.  Would it be 
better to use INSERTs instead of copies?  Doesn't seem to be.


Would it help if you created multiple indexes simultaneously? You have enough 
CPU to do it. Is the index creation CPU or I/O bound? 9 million rows should 
fit in 16 GB of memory, right?


	This is a very very interesting idea.  It looks like we're 
probably not fully utilizing the machine for the index build, and this 
could be the ticket for us.  I'm going to go ahead and set up a test for 
this and we'll see how it goes.


Can you describe the load process in more detail? What's it doing with the 6 
gigs?


	There's two halves to the load process; the loader and the 
summarizer.  The loader is the part that takes 6 gigs; the summarizer only 
takes a few hundred MEG.


	Basically we have these COBOL files that vary in size but 
are usually in the hundred's of MEG realm.  These files contain new data 
OR updates to existing data.  We load this data from the COBOL files in 
chunks, so that's not a place where we're burning a lot of memory.


	The first thing we do is cache the list of COBOL ID codes that are 
already in the DB; the COBOL ID codes are really long numeric strings, so 
we use a sequenced integer primary key.  The cache translates COBOL IDs to 
primary keys, and this takes most of our memory nowadays.  Our cache is 
fast, but it's kind of a memory hog.  We're working on trimming that down, 
but it's definitely faster than making a query for each COBOL ID.


	The load is relatively fast and is considered "acceptable", and 
has been relatively constant in speed.  It's the summarizer that's brutal.


	The summarizer produces 3 main summary tables and a few 
secondaries that don't take much time to make.  Two of them are smallish 
and not that big a deal, and the last one is the biggie that's 9 mil rows 
and growing.  To produce the 9 mil row table, we query out the data in 
groups, do our processing, and save that data to a series of text files 
that are in blocks of 10,000 rows as I recall.  We then copy each file 
into the DB (there were some issues with copying in an entire 9 mil row 
file in the past, which is why we don't use just one file -- those issues 
have been fixed, but we didn't undo the change).


What's your maintenance_work_mem setting? It can make a big difference in 
sorting the data for indexes.


6 gigs currently. :)

If you could post the schema including the indexes, people might have more 
ideas...


I'll have to ask first, but I'll see if I can :)

Talk to you later, and thanks for the info!


Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Steve

Note that you only need to have the ASC and DESC versions of opclasses when
you are going to use multicolumn indexes with some columns in ASC order and
some in DESC order. For columns used by themselves in an index, you don't
need to do this, no matter which order you are sorting on.



Yeah, I assumed the people 'in the know' on this kind of stuff would know 
the details of why I have to have those, and therefore I wouldn't have to 
go into detail as to why -- but you put your finger right on it. :) 
Unfortunately the customer this is for wants certain columns joined at the 
hip for querying and sorting, and this method was a performance godsend 
when we implemented it (with a C .so library, not using SQL in our 
opclasses or anything like that).



Steve

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Steve

Generally speaking, once you've gotten to the point of swapping, even a
little, you've gone too far.  A better approach is to pick some
conservative number, like 10-25% of your ram for shared_buffers, and 1
gig or so for maintenance work_mem, and then increase them while
exercising the system, and measure the difference increasing them makes.

If going from 1G shared buffers to 2G shared buffers gets you a 10%
increase, then good.  If going from 2G to 4G gets you a 1.2% increase,
it's questionable.  You should reach a point where throwing more
shared_buffers stops helping before you start swapping.  But you might
not.

Same goes for maintenance work mem.  Incremental changes, accompanied by
reproduceable benchmarks / behaviour measurements are the way to
determine the settings.

Note that you can also vary those during different times of the day.
you can have maint_mem set to 1Gig during the day and crank it up to 8
gig or something while loading data.  Shared_buffers can't be changed
without restarting the db though.



I'm currently benchmarking various configuration adjustments.  Problem is 
these tests take a really long time because I have to run the load 
process... which is like a 9 hour deal.  That's why I'm asking for advice 
here, because there's a lot of variables here and it's really time costly 
to test :)


I'm still working on the benchmarkings and by Friday I should have some 
interesting statistics to work with and maybe help figure out what's going 
on.



Thanks!

Steve

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Steve
This would probably also be another last ditch option. :)  Our stuff is 
designed to pretty much work on any DB but there are postgres specific 
things in there... not to mention ramp up time on MySQL.  I mean, I know 
MySQL from a user point of view and in a very limited way 
administratively, but I'd be back to square one on learning performance 
stuff :)


Anyway -- I'll listen to what people have to say, and keep this in mind. 
It would be an interesting test to take parts of the process and compare 
at least, if not converting the whole thing.


talk to you later,

Steve

On Wed, 17 Jan 2007, Adam Rich wrote:



Sorry if this came off sounding trollish  All databases have their
strengths & weaknesses, and I feel the responsible thing to do is
exploit
those strengths where possible, rather than expend significant time and
effort coaxing one database to do something it wasn't designed to.
There's just no such thing as "one size fits all".

I have professional experience with MS-SQL, Oracle, MySQL, and Postgres.
and the scenario described sounds more ideal for MySQL & MyISAM than
anything else:

1) No concerns for data loss (turning fsync & full_page_writes off)
since the data can be reloaded

2) No need for MVCC or transactions, since the database is read-only

3) No worries about lock contention

4) Complex queries that might take advantage of the MySQL "Query Cache"
since the base data never changes

5) Queries that might obtain data directly from indexes without having
to touch tables (again, no need for MVCC)

If loading in the base data and creating the summary table is taking
a lot of time, using MySQL with MyISAM tables (and binary logging
disabled) should provide significant time savings, and it doesn't
sound like there's any concerns for the downsides.

Yes, postgresql holds an edge over MySQL for heavy OLTP applications,
I use it for that and I love it.  But for the scenario the original
poster is asking about, MySQL/MyISAM is ideal.




-Original Message-
From: Bricklen Anderson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 17, 2007 3:29 PM
To: Adam Rich
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Configuration Advice


Adam Rich wrote:

Doesn't sound like you want postgres at all Try mysql.


Could you explain your reason for suggesting mysql? I'm simply curious
why you would offer that as a solution.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Steve

The thought:

- Load the big chunk of data into a new table

- Generate some minimal set of indices on the new table

- Generate four queries that compare old to new:
 q1 - See which tuples are unchanged from yesterday to today
 q2  - See which tuples have been deleted from yesterday to today
 q3  - See which tuples have been added
 q4  - See which tuples have been modified

If the "unchanged" set is extremely large, then you might see benefit
to doing updates based on deleting the rows indicated by q2,
inserting rows based on q3, and updating based on q4.

In principle, computing and applying those 4 queries might be quicker
than rebuilding from scratch.

In principle, applying q2, then q4, then vacuuming, then q3, ought to
be "optimal."



	This looks like an interesting idea, and I'm going to take a look 
at how feasible it'll be to impletement.  I may be able to combine this 
with Mr. Wagner's idea to make a much more efficient system overall.  It's 
going to be a  pretty big programming task, but I've a feeling this 
summarizer thing may just need to be re-written with a smarter system 
like this to get something faster.



Thanks!

Steve

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Question about PGSQL functions

2007-03-08 Thread Steve

Hey there :)

I'm re-writing a summarization process that used to be very 'back and 
forth' intensive (i.e. very chatty between my summarization software and 
the DB).  I'm trying to reduce that by practically reducing all this back 
and forth to a single SQL query, but in order to support certain 
complexities it looks to me like I'm going to have to write some postgres 
C language functions.


This is something I'm actually familiar with and have done before, but let 
me describe what I'm trying to do here so I can be sure that this is the 
right thing to do, and to be sure I do it correctly and don't cause memory 
leaks :)



---

I have two columns, one called "procedure_code" and the other called 
"wrong_procedure_code" in my summary table.  These are both of type 
varchar(32) I believe or possibly text -- if it matters I can double check 
but because all the involved columns are the same type and size it 
shouldn't matter. :)


These are actually populated by the "procedure_code" and 
"corrected_procedure_code" in the source table.  The logic is, basically:


IF  strlen(source.corrected_procedure_code)
THEN:
  summary.procedure_code=source.corrected_procedure_code
  summary.wrong_procedure_code=source.procedure_code
ELSE:
  summary.procedure_code=source.procedure_code
  summary.wrong_procedure_code=NULL


Simple, right?  Making a C function to handle this should be no sweat -- 
I would basically split this logic into two separate functions, one to 
populate summary.procedure_code and one to populate 
summary.wrong_procedure_code, and it removes the need of having any sort 
of back and forth between the program and DB... I can just do like:


update summary_table
 set procedure_code=pickCorrect(source.procedure_code,
 source.corrected_procedure_code),
 wrong_procedure_code=pickWrong(source.procedure_code,
 source.corrected_procedure_code),
 from source where summary_table.source_id=source.source_id;


Make sense?  So question 1, is this the good way to do all this?


Question 2: Assuming it is the good way to do all this, would this 
function be correct assuming I did all the other stuff right (like 
PG_FUNCTION_INFO_V1, etc.):


Datum pickCorrect(PG_FUNCTION_ARGS){
text*   procedure_code=PG_GETARG_TEXT_P(0);
text*   corrected_code=PG_GETARG_TEXT_P(1);

if(VARSIZE(corrected_code)-VARHDRSZ){
PG_RETURN_TEXT_P(corrected_code);
}else{
PG_RETURN_TEXT_P(procedure_code);
}
}

Would that simply work because I'm not actually modifying the data, or 
would I have to pmalloc a separate chunk of memory, copy the data, and 
return the newly allocated memory because the memory allocated for the 
args "goes away" or gets corrupted or something otherwise?




Thanks a lot for the info!


Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Question about PGSQL functions

2007-03-08 Thread Steve

Steve wrote:

IF  strlen(source.corrected_procedure_code)
THEN:
  summary.procedure_code=source.corrected_procedure_code
  summary.wrong_procedure_code=source.procedure_code
ELSE:
  summary.procedure_code=source.procedure_code
  summary.wrong_procedure_code=NULL


Um, so you test if source.corrected_procedure_code is an empty string? And if 
it is, summary.procedure_code is set to an empty string? But in 
wrong_procedure_code, you use NULLs?


	Yeah; we could use empty strings if that make it easier for 
whatever reason, but to our front end software NULL vs. empty string 
doesn't actually matter and we never query based on these columns, they're 
for display purposes only.


Simple, right?  Making a C function to handle this should be no sweat -- I 
would basically split this logic into two separate functions, one to 
populate summary.procedure_code and one to populate 
summary.wrong_procedure_code, and it removes the need of having any sort of 
back and forth between the program and DB... I can just do like:


update summary_table
 set procedure_code=pickCorrect(source.procedure_code,
 source.corrected_procedure_code),
 wrong_procedure_code=pickWrong(source.procedure_code,
 source.corrected_procedure_code),
 from source where summary_table.source_id=source.source_id;


ISTM you could write this easily with a little bit of SQL, with no need for 
C-functions (I haven't run this, probably full of typos..) :


update summary_table
 set procedure_code = (CASE WHEN source.corrected_procedure_code = '' THEN 
'' ELSE source.procedure_code END;),
 wrong_procedure_code = (CASE WHEN source.corrected_procedure_code = '' 
THEN source.procedure_code ELSE NULL END;)

 from source where summary_table.source_id=source.source_id;


	This looks interesting and I'm going to give this a shot tomorrow 
and see how it goes.  Speed is somewhat of an issue which is why I 
initially thought of the C function -- plus I wasn't aware you could do 
CASE statements like that :)  Thanks for the idea!



Steve

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] Question about memory allocations

2007-04-11 Thread Steve

Hey there;

I'm trying to tune the memory usage of a new machine that has a -lot- of 
memory in it (32 gigs).  We're upgrading from a machine that had 16 gigs 
of RAM and using a database that's around 130-some gigs on disc.  Our 
largest tables have in the order of close to 10 million rows.


Problem is, the postgres documentation isn't always clear about what 
different memory things are used for and it's definitely not clear about 
what 'useful values' would be for various things.  Further, looking 
online, gets a lot of random stuff and most of the configuration 
information out there is for pre-8.1 versions that don't have all these 
new and strange values :)


This machine exists only for the database.  With that in mind, a few 
questions.



- I've set up a configuration (I'll show important values below), and 
I"m wondering if there's any way I can actually see the distribution of 
memory in the DB and how the memory is being used.


- What is temp_buffers used for exactly?  Does this matter for, say, 
nested queries or anything in specific?  Is there any case where having 
this as a large number actually -helps-?


- Do full_page_writes and wal_buffers settings matter AT ALL for a machine 
where fysnc = off ?


- What does wal_buffers mean and does increasing this value actually help 
anything?


- Any idea if this is a smart configuration for this machine?  It's a 
Redhat Enterprise Linux machine (kernel 2.6.18), 8 dual-core AMD 64bit 
processors, 32 gigs of RAM, 4x 176 (or whatever the exact number is) gig 
SCSI hard drives in a stripe.  Only values I have modified are mentioned, 
everything else left at default:


shared_buffers = 16GB
temp_buffers = 128MB
max_prepared_transactions = 0

# This value is going to probably set off cries of using this as a set
# command instead of a big global value; however there's more big queries
# than small ones and the number of simultaneous users is very small so
# 'for now' this can be set globally big and if it shows improvement
# I'll implement it as set commands later.
#
# Question; does this mean 2 gigs will be immediately allocated to
# every query, or is this just how big the work memory is allowed to
# grow per transaction?
work_mem=2G

maintenance_work_mem = 4GB
max_stack_depth = 16MB

# Vacuum suggested I make this 'over 360' on the old machine, so
# I use this value; if it's too big, this is a symptom of another problem,
# I'd be interested to know :)
max_fsm_pages = 500

# For a lot of reasons, it doesn't make any sense to use fsync for this
# DB.  Read-only during the day, backed up daily, UPS'd, etc.
fsync = off
full_page_writes = off
wal_buffers = 512MB

# Leaving this low makes the DB complain, but I'm not sure what's 
# reasonable.

checkpoint_segments = 128

random_page_cost = 1.5
cpu_tuple_cost = 0.001
cpu_index_tuple_cost = 0.0005
cpu_operator_cost = 0.00025
effective_cache_size = 8GB

default_statistics_target = 100




Thanks for all your help!

Steve

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Question about memory allocations

2007-04-12 Thread Steve



Steve <[EMAIL PROTECTED]> writes:

- What is temp_buffers used for exactly?


Temporary tables.  Pages of temp tables belonging to your own backend
don't ever get loaded into the main shared-buffers arena, they are read
into backend-local memory.  temp_buffers is the max amount (per backend)
of local memory to use for this purpose.


	Are these only tables explicitly stated as 'temporary' (which as I 
recall is a create table option) or are temporary tables used for other 
things to like, say, nested queries or other lil in the background things?



- Any idea if this is a smart configuration for this machine?


Um ... you didn't mention which PG version?



	The latest and greatest stable as downloaded a couple days ago. 
8.2.3. :)



Thanks for the info!


Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve
813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))



Any thoughts?  Both encounter_id and receipt date are indexed columns. 
I've vacuumed and analyzed the table.  I tried making a combined index of 
encounter_id and receipt and it hasn't worked out any better.



Thanks!


Steve

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Could we see the exact definition of that table and its indexes?
It looks like the planner is missing the bitmap scan for some reason,
but I've not seen a case like that before.

Also, I assume the restriction on receipt date is very nonselective?
It doesn't seem to have changed the estimated rowcount much.



	This is true -- This particular receipt date is actually quite 
meaningless.  It's equivalent to saying 'all receipt dates'.  I don't 
think there's even any data that goes back before 2005.


Here's the table and it's indexes.  Before looking, a note; there's 
several 'revop' indexes, this is for sorting.  The customer insisted on, 
frankly, meaninglessly complicated sorts.  I don't think any of that 
matters for our purposes here though :)


Column |  Type  | 
Modifiers

---++
 detailsummary_id  | integer| not null default 
nextval(('detailsummary_id_seq'::text)::regclass)

 detailgroup_id| integer|
 receipt   | date   |
 batchnum  | integer|
 encounternum  | integer|
 procedureseq  | integer|
 procedurecode | character varying(5)   |
 wrong_procedurecode   | character varying(5)   |
 batch_id  | integer|
 encounter_id  | integer|
 procedure_id  | integer|
 carrier_id| integer|
 product_line  | integer|
 provider_id   | integer|
 member_num| character varying(20)  |
 wrong_member_num  | character varying(20)  |
 member_name   | character varying(40)  |
 patient_control   | character varying(20)  |
 rendering_prov_id | character varying(15)  |
 rendering_prov_name   | character varying(30)  |
 referring_prov_id | character varying(15)  |
 referring_prov_name   | character varying(30)  |
 servicedate   | date   |
 wrong_servicedate | date   |
 diagnosis_codes   | character varying(5)[] |
 wrong_diagnosis_codes | character varying(5)[] |
 ffs_charge| double precision   |
 export_date   | date   |
 hedis_date| date   |
 raps_date | date   |
 diagnosis_pointers| character(1)[] |
 modifiers | character(2)[] |
 units | double precision   |
 pos   | character(2)   |
 isduplicate   | boolean|
 duplicate_id  | integer|
 encounter_corrected   | boolean|
 procedure_corrected   | boolean|
 numerrors | integer|
 encerrors_codes   | integer[]  |
 procerror_code| integer|
 error_servicedate | text   |
 e_duplicate_id| integer|
 ecode_counts  | integer[]  |
 p_record_status   | integer|
 e_record_status   | integer|
 e_delete_date | date   |
 p_delete_date | date   |
 b_record_status   | integer|
 b_confirmation| character varying(20)  |
 b_carrier_cobol_id| character varying(16)  |
 b_provider_cobol_id   | character varying(20)  |
 b_provider_tax_id | character varying(16)  |
 b_carrier_name| character varying(50)  |
 b_provider_name   | character varying(50)  |
 b_submitter_file_id   | character varying(40)  |
 e_hist_carrier_id | integer|
 p_hist_carrier_id | integer|
 e_duplicate_id_orig   | character varying(25)  |
 p_duplicate_id_orig   | character varying(25)  |
 num_procerrors| integer|
 num_encerrors | integer|
 export_id | integer|
 raps_id   | integer|
 hedis_id  | integer|
Indexes:
"detail_summary_b_record_status_idx" btree (b_record_status)
"detail_summary_batch_id_idx" btree (batch_id)
"detail_summary_batchnum_idx" btree (batchnum)
"detail_summary_carrier_id_idx" btree (carrier_id)
"detail_summary_duplicate_id_idx" btree (duplicate_id)
"detail_summary_e_record_status_idx" btree (e_record_status)
"detail_summary_encounter_id_idx" btree (encounter_id)
"detail_summary_encounternum_idx" btree (encounternum)
"detail_summary_export_date_idx" btree (export_date)
"detail_summary_hedis_date_idx" btree (hedis_date)
"detail_summary_member_name_idx" btree (member_name)
"detail_summary_member_num_idx" btree (member_num)
"detail_summary_p_record_status_idx" 

Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve


Oy vey ... I hope this is a read-mostly table, because having that many
indexes has got to be killing your insert/update performance.


	Hahaha yeah these are read-only tables.  Nightly inserts/updates. 
Takes a few hours, depending on how many records (between 4 and 10 
usually).  But during the day, while querying, read only.



I see that some of the revop indexes might be considered relevant to
this query, so how exactly have you got those opclasses defined?
There's built-in support for reverse sort as of CVS HEAD, but in
existing releases you must have cobbled something together, and I wonder
if that could be a contributing factor ...


Here's the revops (the c functions are at the bottom):

CREATE FUNCTION ddd_date_revcmp(date, date) RETURNS integer
AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_date_revcmp'
LANGUAGE c STRICT;

CREATE FUNCTION ddd_int_revcmp(integer, integer) RETURNS integer
AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_int_revcmp'
LANGUAGE c STRICT;

CREATE FUNCTION ddd_text_revcmp(text, text) RETURNS integer
AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_text_revcmp'
LANGUAGE c STRICT;

CREATE OPERATOR CLASS date_revop
FOR TYPE date USING btree AS
OPERATOR 1 >(date,date) ,
OPERATOR 2 >=(date,date) ,
OPERATOR 3 =(date,date) ,
OPERATOR 4 <=(date,date) ,
OPERATOR 5 <(date,date) ,
FUNCTION 1 ddd_date_revcmp(date,date);

CREATE OPERATOR CLASS int4_revop
FOR TYPE integer USING btree AS
OPERATOR 1 >(integer,integer) ,
OPERATOR 2 >=(integer,integer) ,
OPERATOR 3 =(integer,integer) ,
OPERATOR 4 <=(integer,integer) ,
OPERATOR 5 <(integer,integer) ,
FUNCTION 1 ddd_int_revcmp(integer,integer);

CREATE OPERATOR CLASS text_revop
FOR TYPE text USING btree AS
OPERATOR 1 >(text,text) ,
OPERATOR 2 >=(text,text) ,
OPERATOR 3 =(text,text) ,
OPERATOR 4 <=(text,text) ,
OPERATOR 5 <(text,text) ,
FUNCTION 1 ddd_text_revcmp(text,text);

Datum   ddd_date_revcmp(PG_FUNCTION_ARGS){
DateADT arg1=PG_GETARG_DATEADT(0);
DateADT arg2=PG_GETARG_DATEADT(1);

PG_RETURN_INT32(arg2 - arg1);
}


Datum   ddd_int_revcmp(PG_FUNCTION_ARGS){
int32   arg1=PG_GETARG_INT32(0);
int32   arg2=PG_GETARG_INT32(1);

PG_RETURN_INT32(arg2 - arg1);
}

Datum   ddd_text_revcmp(PG_FUNCTION_ARGS){
char*   arg1=(char*)VARDATA(PG_GETARG_TEXT_P(0));
char*   arg2=(char*)VARDATA(PG_GETARG_TEXT_P(1));

if((*arg1) != (*arg2)){
PG_RETURN_INT32(*arg2 - *arg1);
}else{
PG_RETURN_INT32(strcmp(arg2,arg1));
}
}



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

  Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
(2 rows)


How accurate is the row estimate made by the planner?  (explain analyze
to be sure)



Results:

 Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4) 
(actual time=62871.386..257258.249 rows=112 loops=1)
   Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))

 Total runtime: 257258.652 ms



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

[ itch... ]  That code is just completely wrong, because the contents
of a TEXT datum aren't guaranteed null-terminated.  It'd be better to
invoke bttextcmp and negate its result.

That's not relevant to your immediate problem, but if you've noticed
any strange behavior with your text_revop indexes, that's the reason...


	The indexes have all worked, though I'll make the change anyway. 
Documentation on how to code these things is pretty sketchy and I believe 
I followed an example on the site if I remember right. :/


Thanks for the info though :)


Steve

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve


So there's a misjudgment of the number of rows returned by a factor of
about 88.  That's pretty big.  Since you had the same number without the
receipt date (I think...) then it's the encounter_id that's not being
counted right.

Try upping the stats target on that column and running analyze again and
see if you get closer to 112 in your analyze or not.



If I max the statistics targets at 1000, I get:

 Seq Scan on detail_summary ds  (cost=0.00..1903030.26 rows=1099 width=4)
   Filter: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))



	Setting it ot 500 makes the estimated rows twice as much.  It 
seems to have no effect on anything though, either way. :)



Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Here's my planner parameters:

seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 1.5  # same scale as above
cpu_tuple_cost = 0.001  # same scale as above
cpu_index_tuple_cost = 0.0005   # same scale as above
cpu_operator_cost = 0.00025 # same scale as above
effective_cache_size = 8192MB

default_statistics_target = 100 # range 1-1000


On a machine with 16 gigs of RAM.  I tried to make it skew towards 
indexes.  However, even if I force it to use the indexes 
(enable_seqscan=off) it doesn't make it any faster really :/


Steve

On Thu, 12 Apr 2007, Tom Lane wrote:


Scott Marlowe <[EMAIL PROTECTED]> writes:

So there's a misjudgment of the number of rows returned by a factor of
about 88.  That's pretty big.  Since you had the same number without the
receipt date (I think...) then it's the encounter_id that's not being
counted right.


I don't think that's Steve's problem, though.  It's certainly
misestimating, but nonetheless the cost estimate for the seqscan is
1902749.83 versus 14819.81 for the bitmap scan; it should've picked
the bitmap scan anyway.

I tried to duplicate the problem here, without any success; I get

Bitmap Heap Scan on detail_summary ds  (cost=422.01..801.27 rows=100 width=4)
  Recheck Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
  Filter: (receipt >= '1998-12-30'::date)
  ->  Bitmap Index Scan on detail_summary_encounter_id_idx  (cost=0.00..421.98 
rows=100 width=0)
Index Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))

so either this has been fixed by a post-8.2.3 bug fix (which I doubt,
it doesn't seem familiar at all) or there's some additional contributing
factor.  Steve, are you using any nondefault planner parameters?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

It's a redhat enterprise machine running AMD x64 processors.

Linux ers3.dddcorp.com 2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:13:42 EST 
2007 x86_64 x86_64 x86_64 GNU/Linux


It was compiled by me, straight up, nothing weird at all, no odd compiler 
options or wahtever :)


So yeah :/ I'm quite baffled as well,

Talk to you later,

Steve


On Thu, 12 Apr 2007, Tom Lane wrote:


Steve <[EMAIL PROTECTED]> writes:

Here's my planner parameters:


I copied all these, and my 8.2.x still likes the bitmap scan a lot
better than the seqscan.  Furthermore, I double-checked the CVS history
and there definitely haven't been any changes in that area in REL8_2
branch since 8.2.3.  So I'm a bit baffled.  Maybe the misbehavior is
platform-specific ... what are you on exactly?  Is there anything
nonstandard about your Postgres installation?

regards, tom lane



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Table size: 16,037,728 rows

With enable_seqscan=off I get:

 Bitmap Heap Scan on detail_summary ds  (cost=4211395.20..4213045.32 
rows=1099 width=4)
   Recheck Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
   ->  Bitmap Index Scan on detail_summary_receipt_encounter_idx 
(cost=0.00..4211395.17 rows=1099 width=0)
 Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = 
ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))



The explain analyze is pending, running it now (however it doens't really 
appear to be any faster using this plan).



Steve

On Thu, 12 Apr 2007, Tom Lane wrote:


Steve <[EMAIL PROTECTED]> writes:

... even if I force it to use the indexes
(enable_seqscan=off) it doesn't make it any faster really :/


Does that change the plan, or do you still get a seqscan?

BTW, how big is this table really (how many rows)?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Here's the explain analyze with seqscan = off:

 Bitmap Heap Scan on detail_summary ds  (cost=4211395.20..4213045.32 
rows=1099 width=4) (actual time=121288.825..121305.908 rows=112 loops=1)
   Recheck Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
   ->  Bitmap Index Scan on detail_summary_receipt_encounter_idx 
(cost=0.00..4211395.17 rows=1099 width=0) (actual 
time=121256.681..121256.681 rows=112 loops=1)
 Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id = 
ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))

 Total runtime: 121306.233 ms


Your other question is answered in the other mail along with the 
non-analyze'd query plan :D


Steve

On Thu, 12 Apr 2007, Tom Lane wrote:


Steve <[EMAIL PROTECTED]> writes:

... even if I force it to use the indexes
(enable_seqscan=off) it doesn't make it any faster really :/


Does that change the plan, or do you still get a seqscan?

BTW, how big is this table really (how many rows)?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

If the other indexes are removed, with enable_seqscan=on:

 Bitmap Heap Scan on detail_summary ds  (cost=154.10..1804.22 rows=1099 
width=4)
   Recheck Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))

   Filter: (receipt >= '1998-12-30'::date)
   ->  Bitmap Index Scan on detail_summary_encounter_id_idx 
(cost=0.00..154.07 rows=1099 width=0)
 Index Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))



With it off:

 Bitmap Heap Scan on detail_summary ds  (cost=154.10..1804.22 rows=1099 
width=4)
   Recheck Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))

   Filter: (receipt >= '1998-12-30'::date)
   ->  Bitmap Index Scan on detail_summary_encounter_id_idx 
(cost=0.00..154.07 rows=1099 width=0)
 Index Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))



Either way, it runs perfectly fast.  So it looks like the indexes are 
confusing this query like you suspected.  Any advise?  This isn't the only 
query we run on this table, much as I'd absolutely love to kill off some 
indexes to imrpove our nightly load times I can't foul up the other 
queries :)



Thank you very much for all your help on this issue, too!

Steve

On Thu, 12 Apr 2007, Tom Lane wrote:


Steve <[EMAIL PROTECTED]> writes:

With enable_seqscan=off I get:



->  Bitmap Index Scan on detail_summary_receipt_encounter_idx
(cost=0.00..4211395.17 rows=1099 width=0)
  Index Cond: ((receipt >= '1998-12-30'::date) AND (encounter_id =
ANY ...



The explain analyze is pending, running it now (however it doens't really
appear to be any faster using this plan).


Yeah, that index is nearly useless for this query --- since the receipt
condition isn't really eliminating anything, it'll

Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Okay -- I started leaving indexes on one by one.

The explain broke when the detail_summary_receipt_encounter_idx index was 
left on (receipt, encounter_id).


Just dropping that index had no effect, but there's a LOT of indexes that 
refer to receipt.  So on a hunch I tried dropping all indexes that refer 
to receipt date and that worked -- so it's the indexes that contain 
receipt date that are teh problem.


For more fun, I tried leaving the index that's just receipt date alone 
(index detail_summary_receipt_id_idx) and THAT produced the correct query; 
it's all these multi-column queries that are fouling things up, it would 
seem!



 So does this mean I should experiment with dropping those indexes? 
I'm not sure if that will result in 'bad things' as there are other 
complicated actions like sorts that may go real slow if I drop those 
indexes.  BUT I think it'll be easy to convince the customer to drop their 
absurdly complicated sorts if I can come back with serious results like 
what we've worked out here.



And thanks again -- have a good dinner! :)

Steve


On Thu, 12 Apr 2007, Tom Lane wrote:


Steve <[EMAIL PROTECTED]> writes:

Either way, it runs perfectly fast.  So it looks like the indexes are
confusing this query like you suspected.  Any advise?


Wow --- sometimes grasping at straws pays off.  I was testing here with
just a subset of the indexes to save build time, but I bet that one of
the "irrelevant" ones is affecting this somehow.  Time to re-test.

If you have some time to kill, it might be interesting to vary that
begin/rollback test script to leave one or two other indexes in place,
and see if you can identify exactly which other index(es) get it
confused.

I'm about to go out to dinner with the wife, but will have a closer
look when I get back, or tomorrow morning.  We'll figure this out.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Here you go:

 detail_summary_b_record_status_idx
 detail_summary_batch_id_idx
 detail_summary_batchnum_idx
 detail_summary_carrier_id_idx
 detail_summary_duplicate_id_idx
 detail_summary_e_record_status_idx
 detail_summary_encounter_id_idx
 detail_summary_encounternum_idx
 detail_summary_export_date_idx
 detail_summary_hedis_date_idx
 detail_summary_member_name_idx
 detail_summary_member_num_idx
 detail_summary_p_record_status_idx
 detail_summary_patient_control_idx
 detail_summary_procedurecode_idx
 detail_summary_product_line_idx
 detail_summary_provider_id_idx
 detail_summary_raps_date_idx
 detail_summary_receipt_id_idx
 detail_summary_referrering_prov_id_idx
 detail_summary_rendering_prov_id_idx
 detail_summary_rendering_prov_name_idx
 detail_summary_servicedate_idx
 ds_sort_1
 ds_sort_10
 ed_cbee_norev
 ed_cbee_norev_p
 ed_cbee_rev
 ed_cbee_rev_p
 mcbe
 mcbe_p
 mcbe_rev
 mcbe_rev_p
 mcbee_norev
 mcbee_norev_p
 mcbee_rev
 mcbee_rev_p
 pcbee_norev
 pcbee_norev_p
 pcbee_rev
 pcbee_rev_p
 rcbee_norev
 rcbee_norev_p
 rp_cbee_norev
 rp_cbee_norev_p
 rp_cbee_rev
 rp_cbee_rev_p
 sd_cbee_norev
 sd_cbee_norev_p
 sd_cbee_rev
 sd_cbee_rev_p
 testrev
 testrev_p
 detail_summary_receipt_encounter_idx


On Thu, 12 Apr 2007, Tom Lane wrote:


Steve <[EMAIL PROTECTED]> writes:

Just dropping that index had no effect, but there's a LOT of indexes that
refer to receipt.  So on a hunch I tried dropping all indexes that refer
to receipt date and that worked -- so it's the indexes that contain
receipt date that are teh problem.


I'm still not having any luck reproducing the failure here.  Grasping at
straws again, I wonder if it's got something to do with the order in
which the planner examines the indexes --- which is OID order.  Could
you send me the results of

select indexrelid::regclass from pg_index where indrelid = 
'detail_summary'::regclass order by indexrelid;

regards, tom lane



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Question about memory allocations

2007-04-12 Thread Steve
I didn't notice anyone address this for you yet.  There is a tool in 
contrib/pg_buffercache whose purpose in life is to show you what the shared 
buffer cache has inside it.  The documentation in that directory leads 
through installing it.  The additional variable you'll likely never know is 
what additional information is inside the operating system's buffer cache.


Okay -- thanks!  I'll take a look at this.

# Leaving this low makes the DB complain, but I'm not sure what's # 
reasonable.

checkpoint_segments = 128


That's a reasonable setting for a large server.  The main downside to setting 
it that high is longer recovery periods after a crash, but I doubt that's a 
problem for you if you're so brazen as to turn off fsync.


	Hahaha yeah.  It's 100% assumed that if something goes bad we're 
restoring from the previous day's backup.  However because the DB is read 
only for -most- of the day and only read/write at night it's acceptable 
risk for us anyway.  But good to know that's a reasonable value.



Steve

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Question about memory allocations

2007-04-13 Thread Steve

Really?

Wow!

Common wisdom in the past has been that values above a couple of hundred
MB will degrade performance.  Have you done any benchmarks on 8.2.x that
show that you get an improvement from this, or did you just take the
"too much of a good thing is wonderful" approach?



	Not to be rude, but there's more common wisdom on this particular 
subject than anything else in postgres I'd say ;)  I think I recently read 
someone else on this list who's laundry-listed the recommended memory 
values that are out there these days and pretty much it ranges from 
what you've just said to "half of system memory".


	I've tried many memory layouts, and in my own experience with 
this huge DB, more -does- appear to be better but marginally so; more 
memory alone won't fix a speed problem.  It may be a function of how much 
reading/writing is done to the DB and if fsync is used or not if that 
makes any sense :)  Seems there's no "silver bullet" to the shared_memory 
question.  Or if there is, nobody can agree on it ;)



Anyway, talk to you later!


Steve

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Copy database performance issue

2006-10-23 Thread Steve

Hello there;

I've got an application that has to copy an existing database to a new 
database on the same machine.


I used to do this with a pg_dump command piped to psql to perform the 
copy; however the database is 18 gigs large on disk and this takes a LONG 
time to do.


So I read up, found some things in this list's archives, and learned that 
I can use createdb --template=old_database_name to do the copy in a much 
faster way since people are not accessing the database while this copy 
happens.



The problem is, it's still too slow.  My question is, is there any way I 
can use 'cp' or something similar to copy the data, and THEN after that's 
done modify the database system files/system tables to recognize the 
copied database?


For what it's worth, I've got fsync turned off, and I've read every tuning 
thing out there and my settings there are probably pretty good.  It's a 
Solaris 10 machine (V440, 2 processor, 4 Ultra320 drives, 8 gig ram) and 
here's some stats:


shared_buffers = 30
work_mem = 102400
maintenance_work_mem = 1024000

bgwriter_lru_maxpages=0
bgwriter_lru_percent=0

fsync = off
wal_buffers = 128
checkpoint_segments = 64


Thank you!


Steve Conley

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] How to improve db performance with $7K?

2005-03-25 Thread Steve Poe
Situation:  An 24/7 animal hospital (100 employees) runs their business 
on Centos 3.3 (RHEL 3) Postgres 7.4.2 (because they have to) off a 2-CPU 
Xeon 2.8MHz, 4GB of RAM, (3) SCSI disks RAID 0 (zcav value 35MB per 
sec). The databse is 11GB comprised over 100 tables and indexes from 1MB 
to 2GB in size.

I recently told the hospital management team worst-case scenerio they 
need to get the database on its own drive array since the RAID0 is a 
disaster wating to happen. I said ideally a new dual AMD server with 
6/7-disk configuration would be ideal for safety and performance, but 
they don't have $15K.  I said a seperate drive array offer the balance 
of safety and performance.

I have been given budget of $7K to accomplish a safer/faster database 
through hardware upgrades.  The objective is to get a drive array, but I 
can use the budget any way I see fit to accomplish the goal.

Since I am a dba novice, I did not physically build this server, nor did 
I write the application the hospital runs on, but I have the opportunity 
to make it better, I'd thought I should seek some advice from those who 
have been down this road before.  Suggestions/ideas anyone?

Thanks.
Steve Poe
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] How to improve db performance with $7K?

2005-03-25 Thread Steve Poe
Tom,
From what I understand, the vendor used ProIV for development, when 
they attempted to use 7.4.3, they had ODBC issues and something else I 
honestly don't know, but I was told that data was not coming through 
properly. Their somewhat at the mercy of the ProIV people to give them 
the stamp of approval, then the vendor will tell us what they support.

Thanks.
Steve Poe
Tom Lane wrote:
Steve Poe <[EMAIL PROTECTED]> writes:
 

Situation:  An 24/7 animal hospital (100 employees) runs their business 
on Centos 3.3 (RHEL 3) Postgres 7.4.2 (because they have to)
   

[ itch... ]  Surely they could at least move to 7.4.7 without pain.
There are serious data-loss bugs known in 7.4.2.
regards, tom lane
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings
 


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] How to improve db performance with $7K?

2005-03-26 Thread Steve Poe
You could build a dual opteron with 4 GB of ram, 12 10k raptor SATA 
drives with a battery backed cache for about 7k or less.

Okay. You trust SATA drives? I've been leary of them for a production 
database. Pardon my ignorance, but what is a "battery backed cache"?  I 
know the drives have a built-in cache but I don't if that's the same. 
Are the 12 drives internal or an external chasis? Could you point me to 
a place that this configuration exist? 

Or if they are not CPU bound just IO bound you could easily just
add an external 12 drive array (even if scsi) for less than 7k.
I don't believe it is CPU bound. At our busiest hour, the CPU is idle 
about 70% on average down to 30% idle at its heaviest. Context switching 
averages about 4-5K per hour with momentary peaks to 25-30K for a 
minute.  Overall disk performance is poor (35mb per sec).

Thanks for your input.
Steve Poe


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] How to improve db performance with $7K?

2005-03-26 Thread Steve Poe
Steve, can we clarify that you are not currently having any performance 
issues, you're just worried about failure?   Recommendations should be based 
on whether improving applicaiton speed is a requirement ...
Josh,
The priorities are: 1)improve safety/failure-prevention, 2) improve 
performance.

The owner of the company wants greater performance (and, I concure to 
certain degree), but the owner's vote is only 1/7 of the managment team. 
And, the rest of the management team is not as focused on performance. 
They all agree in safety/failure-prevention.

Steve



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to improve db performance with $7K?

2005-03-26 Thread Steve Poe

The Chenbros are nice, but kinda pricey ($800) if Steve doesn't need the 
machine to be rackable.

If your primary goal is redundancy, you may wish to consider the possibility 
of building a brand-new machine for $7k (you can do a lot of machine for 
$7000 if it doesn't have to be rackable) and re-configuring the old machine 
and using it as a replication or PITR backup.   This would allow you to 
configure the new machine with only a moderate amount of hardware redundancy 
while still having 100% confidence in staying running.

 

Our servers are not racked, so a new one does not have to be. *If* it is 
possible, I'd like to replace the main server with a new one. I could 
tweak the new one the way I need it and work with the vendor to make 
sure everything works well. In either case, I'll still need to test how 
positioning of the tables/indexes across a raid10 will perform.  I am 
also waiting onProIV developers feedback.  If their ProvIV modules will 
not run under AMD64, or take advantage of the processor, then I'll stick 
with the server we have.

Steve Poe





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] which dual-CPU hardware/OS is fastest for PostgreSQL?

2005-03-28 Thread Steve Poe
Greg Stark wrote:
"Merlin Moncure" <[EMAIL PROTECTED]> writes:
 

Alex wrote:
   

Without starting too much controvesy I hope, I would seriously
recommend you evaluate the AMCC Escalade 9500S SATA controller.  
 

.
 

At the risk of shaming myself with another 'me too' post, I'd like to
say that my experiences back this up 100%.  The Escalade controllers are
excellent and the Raptor drives are fast and reliable (so far).  
   

.
I assume AMCC == 3ware now?
Has anyone verified that fsync is safe on these controllers? Ie, that they
aren't caching writes and "lying" about the write completing like IDE
drives oft
 

For those who speak highly of the Escalade controllers and/Raptor SATA 
drives, how is the database being utilized, OLTP or primarily read 
access? This is good information I am learning, but I also see the need 
to understand the context of how the hardware is being used.

Steve Poe
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to improve db performance with $7K?

2005-03-28 Thread Steve Poe
Cott Lang wrote:
Have you already considered application/database tuning?  Adding
indexes? shared_buffers large enough? etc. 

Your database doesn't seem that large for the hardware you've already
got. I'd hate to spend $7k and end up back in the same boat. :)
 

Cott,
I agree with you. Unfortunately, I am not the developer of the 
application. The vendor uses ProIV which connects via ODBC.  The vendor 
could certain do some tuning and create more indexes where applicable. I 
am encouraging the vendor to take a more active role and we work 
together on this.

With hardware tuning, I am sure we can do better than 35Mb per sec. Also 
moving the top 3 or 5 tables and indexes to their own slice of a RAID10 
and moving pg_xlog to its own drive will help too.

Since you asked about tuned settings, here's what we're using:
kernel.shmmax = 1073741824
shared_buffers = 1
sort_mem = 8192
vacuum_mem = 65536
effective_cache_size = 65536
Steve Poe


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Steve Wampler
Mohan, Ross wrote:
> VOIP over BitTorrent?  

Now *that* I want to see.  Aught to be at least as interesting
as the "TCP/IP over carrier pigeon" experiment - and more
challenging to boot!


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Steve Wampler
[EMAIL PROTECTED] wrote:

>>Mohan, Ross wrote:
>>
>>>VOIP over BitTorrent?
>>
>>Now *that* I want to see.  Aught to be at least as interesting
>>as the "TCP/IP over carrier pigeon" experiment - and more
>>challenging to boot!
>>
> 
> 
> It was very challenging.  I worked on the credit window sizing and
> retransmission timer estimation algorithms.  We took into account weather
> patterns, size and age of the bird, feeding times, and the average number
> of times a bird circles before determining magnetic north.  Interestingly,
> packet size had little effect in the final algorithms.
> 
> I would love to share them with all of you, but they're classified.

Ah, but VOIPOBT requires many people all saying the same thing at the
same time.  The synchronization alone (since you need to distribute
these people adequately to avoid overloading a trunk line...) is probably
sufficiently hard to make it interesting.  Then there are the problems of
different accents, dilects, and languages ;)

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Reading recommendations

2005-03-31 Thread Steve Wampler
Stefan Weiss wrote:
> On 2005-03-31 15:19, [EMAIL PROTECTED] wrote:
> 
>>>>>Now *that* I want to see.  Aught to be at least as interesting
>>>>>as the "TCP/IP over carrier pigeon" experiment - and more
>>>>>challenging to boot!
> 
> ..
> 
>>Interestingly, we had a follow on contract to investigate routing
>>optimization using flooding techniques.  Oddly, it was commissioned by a
>>consortium of local car washes.  Work stopped when the park service sued us
>>for the cost of cleaning all the statuary, and the company went out of
>>business.  We were serving "cornish game hens" at our frequent dinner
>>parties for months.
> 
> 
> This method might have been safer (and it works great with Apaches):
> http://eagle.auc.ca/~dreid/

Aha - VOIPOBD as well as VOIPOBT!  What more can one want?

VOIPOCP, I suppose...


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Follow-Up: How to improve db performance with $7K?

2005-03-31 Thread Steve Poe
Thanks for everyone's feedback on to best improve our Postgresql 
database for the animal hospital. I re-read the PostgreSQL 8.0 
Performance Checklist just to keep focused.

We purchased (2) 4 x 146GB 10,000rpm SCSI U320 SCA drive arrays ($2600) 
and (1) Sun W2100z dual AMD64 workstation with 4GB RAM ($2500). We did 
not need a rack-mount server, so I though Sun's workstation would do 
fine. I'll double the RAM. Hopefully, this should out-perform our dual 
2.8 Xeon with 4GB of RAM.

Now, we need to purchase a good U320 RAID card now. Any suggestions for 
those  which run well under Linux?

These two drive arrays main purpose is for our database. For those 
messed with drive arrays before, how would you slice-up the drive array? 
Will database performance be effected how our RAID10 is configured? Any 
suggestions?

Thanks.
Steve Poe

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Steve Poe

Alex Turner wrote:
To be honest, I've yet to run across a SCSI configuration that can
touch the 3ware SATA controllers.  I have yet to see one top 80MB/sec,
let alone 180MB/sec read or write, which is why we moved _away_ from
SCSI.  I've seen Compaq, Dell and LSI controllers all do pathetically
badly on RAID 1, RAID 5 and RAID 10.
 

Alex,
How does the 3ware controller do in heavy writes back to the database? 
It may have been Josh, but someone said that SATA does well with reads 
but not writes. Would not equal amount of SCSI drives outperform SATA?  
I don't want to start a "whose better" war, I am just trying to learn 
here. It would seem the more  drives you could place in a RAID 
configuration, the performance would increase.

Steve Poe

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application

2005-04-06 Thread Steve Wampler
Mohan, Ross wrote:
> I wish I had a Dell system and run case to show you Alex, but I don't...
> however...using Oracle's "direct path" feature, it's pretty straightforward. 
> 
> We've done 110,000 rows per second into index-less tables on a big system
> (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. 
> Sustained
> for almost 9 minutes. )
> 
> Yes, this is an exception, but oracle directpath/InsertAppend/BulkLoad
> feature enabled us to migrate a 4 TB database...really quickly. 

How close to this is PG's COPY?  I get surprisingly good results using
COPY with jdbc on smallish systems (now if that patch would make into
the mainstream PG jdbc support!)  I think COPY has a bit more overhead
than what a Bulkload feature may have, but I suspect it's not that
much more.

> Now...if you ask me "can this work without Power5 and Hitachi SAN?"
> my answer is..you give me a top end Dell and SCSI III on 15K disks
> and I'll likely easily match it, yea.
> 
> I'd love to see PG get into this range..i am a big fan of PG (just a
> rank newbie) but I gotta think the underlying code to do this has
> to be not-too-complex.

It may not be that far off if you can use COPY instead of INSERT.
But comparing Bulkload to INSERT is a bit apples<->orangish.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Plan for relatively simple query seems to be very inefficient

2005-04-06 Thread Steve Atkins
On Wed, Apr 06, 2005 at 06:52:35PM +0200, Arjen van der Meijden wrote:
> Hi list,
> 
> I noticed on a forum a query taking a surprisingly large amount of time 
> in MySQL. Of course I wanted to prove PostgreSQL 8.0.1 could do it much 
> better. To my surprise PostgreSQL was ten times worse on the same 
> machine! And I don't understand why.
> 
> I don't really need this query to be fast since I don't use it, but the 
> range-thing is not really an uncommon query I suppose. So I'm wondering 
> why it is so slow and this may point to a wrong plan being chosen or 
> generated.

That's the wrong index type for fast range queries. You really need
something like GiST or rtree for that. I do something similar in
production and queries are down at the millisecond level with the
right index.


Cheers,
  Steve
 
> Here are table definitions:
> 
> Table "public.postcodes"
>Column| Type  | Modifiers
> -+---+---
>  postcode_id | smallint  | not null
>  range_from  | smallint  |
>  range_till  | smallint  |
> Indexes:
> "postcodes_pkey" PRIMARY KEY, btree (postcode_id)
> "range" UNIQUE, btree (range_from, range_till)
> 
>Table "public.data_main"
>  Column |   Type   | Modifiers
> +--+---
>  userid | integer  | not null
>  range  | smallint |
> Indexes:
> "data_main_pkey" PRIMARY KEY, btree (userid)
> 
> And here's the query I ran:
> 
> SELECT COUNT(*) FROM
> data_main AS dm,
> postcodes AS p
> WHERE dm.range BETWEEN p.range_from AND p.range_till

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Is Indexed View Supported in psql 7.1.3??

2005-04-10 Thread Steve Poe
stp,
I cannot help you with the first point, but as far as performance 
analysis, I share with you what I've been using.

1) pgbench -- which comes with PostgreSQL
2) OSDB (http://osdb.sourceforge.net/)
3) pg_autotune (http://pgfoundry.org/projects/pgautotune/)
4) PQA (http://pgfoundry.org/projects/pqa/)
You did not mention how your database is being used/going to be used. If 
its already in production, use  PQA, but I personally have not 
implemented yet since seemed to be to take a performance hit of 15-25% 
when running it. Your mileage may vary.

I use pgbench for quick tests and OSDB for more disk thrash testing.
I am new to this; maybe someone else may be able to speak from more 
experience.

Regards.
Steve Poe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] How to improve db performance with $7K?

2005-04-14 Thread Steve Poe
If SATA drives don't have the ability to replace SCSI for a multi-user
Postgres apps, but you needed to save on cost (ALWAYS an issue), 
could/would you implement SATA for your logs (pg_xlog) and keep the rest 
on SCSI?

Steve Poe
Mohan, Ross wrote:
I've been doing some reading up on this, trying to keep up here, 
and have found out that (experts, just yawn and cover your ears)

1) some SATA drives (just type II, I think?) have a "Phase Zero"
   implementation of Tagged Command Queueing (the special sauce
   for SCSI).
2) This SATA "TCQ" is called NCQ and I believe it basically
   allows the disk software itself to do the reordering
   (this is called "simple" in TCQ terminology) It does not
   yet allow the TCQ "head of queue" command, allowing the
   current tagged request to go to head of queue, which is
   a simple way of manifesting a "high priority" request.
3) SATA drives are not yet multi-initiator?
Largely b/c of 2 and 3, multi-initiator SCSI RAID'ed drives
are likely to whomp SATA II drives for a while yet (read: a
year or two) in multiuser PostGres applications. 


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Greg Stark
Sent: Thursday, April 14, 2005 2:04 PM
To: Kevin Brown
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to improve db performance with $7K?
Kevin Brown <[EMAIL PROTECTED]> writes:
 

Greg Stark wrote:
   

I think you're being misled by analyzing the write case.
Consider the read case. When a user process requests a block and 
that read makes its way down to the driver level, the driver can't 
just put it aside and wait until it's convenient. It has to go ahead 
and issue the read right away.
 

Well, strictly speaking it doesn't *have* to.  It could delay for a 
couple of milliseconds to see if other requests come in, and then 
issue the read if none do.  If there are already other requests being 
fulfilled, then it'll schedule the request in question just like the 
rest.
   

But then the cure is worse than the disease. You're basically describing 
exactly what does happen anyways, only you're delaying more requests than 
necessary. That intervening time isn't really idle, it's filled with all the 
requests that were delayed during the previous large seek...
 

Once the first request has been fulfilled, the driver can now schedule 
the rest of the queued-up requests in disk-layout order.

I really don't see how this is any different between a system that has 
tagged queueing to the disks and one that doesn't.  The only 
difference is where the queueing happens.
   

And *when* it happens. Instead of being able to issue requests while a 
large seek is happening and having some of them satisfied they have to wait 
until that seek is finished and get acted on during the next large seek.
If my theory is correct then I would expect bandwidth to be essentially 
equivalent but the latency on SATA drives to be increased by about 50% of the 
average seek time. Ie, while a busy SCSI drive can satisfy most requests in 
about 10ms a busy SATA drive would satisfy most requests in 15ms. (add to that 
that 10k RPM and 15kRPM SCSI drives have even lower seek times and no such 
IDE/SATA drives exist...)
In reality higher latency feeds into a system feedback loop causing your 
application to run slower causing bandwidth demands to be lower as well. It's 
often hard to distinguish root causes from symptoms when optimizing complex 
systems.
 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Spend 7K *WHERE*? WAS Intel SRCS16 SATA raid? and How

2005-04-15 Thread Steve Poe
Ross,
I agree with you, but I' am the lowly intergrator/analyst, I have to solve the 
problem without all
the authority (sounds like a project manager). I originally started this thread 
since I had the $7k budget.
I am not a dba/developer. but I play one on t.v., so I can only assume that 
throwing money
at the application code means one understand what the bottleneck in the code 
and what it takes to fix it.
In this situation, the code is hidden by the vendor that connects to the 
database. So, besides persisent requests of the vendor to improve the area of 
the application, the balance of tuning lies with the hardware. The answer is 
*both* hardware and application code. Finding the right balance is key. Your 
mileage may vary.
Steve Poe


If, however, in the far-more-likely case that the application code
or system/business process is the throttle point, it'd be a great
use of money to have a test report showing that to the "higher ups". 
That's where the best scalability bang-for-buck can be made. 

- Ross
p.s. having said this, and as already been noted "7K" ain't
going to buy that muchmaybe the ability to go RAID 10?

p.p.s  Why don't we start a PGSQL-7K listserv, to handle this EPIC thread?  :-) 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-15 Thread Steve Poe
Tom,
People's opinions on pgbench may vary, so take what I say with a grain 
of salt. Here are my thoughts:

1) Test with no less than 200 transactions per client. I've heard with 
less than this, your results will vary too much with the direction of 
the wind blowing. A high enough value will help rule out some "noise" 
factor. If I am wrong, please let me know.

2) How is the database going to  be used?  What percentage will be 
read/write if you had to guess? Pgbench is like a TPC-B with will help 
guage the potential throughput of your tps. However, it may not stress 
the server enough to help you make key performance changes. However, 
benchmarks are like statistics...full of lies .

3) Run not just a couple pgbench runs, but *many* (I do between 20-40 
runs) so you can rule out noise and guage improvement on median results.

4) Find something that you test OLTP-type transactions. I used OSDB 
since it is simple to implement and use. Although OSDL's OLTP testing 
will closer to reality.

Steve Poe
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Postgresql works too slow

2005-04-17 Thread Steve Poe
Nurlan,
Try enabliing your checkpoint_segments. In my example, our database 
restore took 75mins. After enabling checkpoints_segments to 20, we cut 
it down to less than 30 minutes.  Is your pg_xlog on a seperate disc..or 
at least a partition? This will help too. A checkpoints_segments of 20, 
if memory serves correctly, will occupy around 800-900M of disc space in 
pg_xlog.

Steve Poe
Nurlan Mukhanov (AL/EKZ) wrote:
Hello. 

I'm trying to restore my database from dump in several parrallel processes, but 
restore process works too slow.
Number of rows about 100 000 000,
RAM: 8192M
CPU: Ultra Sparc 3
Number of CPU: 4
OS: SunOS sun 5.8
RDBMS: PostgreSQL 8.0
prstat info
  PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP
14231 postgres   41M   37M sleep   580   0:00.01 0.2% postgres/1
14136 postgres   41M   37M sleep   580   0:00.03 0.2% postgres/1
14211 postgres   41M   37M sleep   580   0:00.01 0.2% postgres/1
14270 postgres   41M   37M sleep   580   0:00.00 0.2% postgres/1
13767 postgres   41M   37M sleep   580   0:00.18 0.2% postgres/1
13684 postgres   41M   36M sleep   580   0:00.14 0.2% postgres/1
NPROC USERNAME  SIZE   RSS MEMORY  TIME  CPU
   74 root  272M  191M   2.3%   0:26.29  24%
  124 postgres 1520M 1306M16%   0:03.05 5.0%
How to encrease postgresql speed? Why postgres took only 5.0% of CPU time?
Nurlan Mukhanov 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Steve Poe
Alex,
In the situation of the animal hospital server I oversee, their 
application is OLTP. Adding hard drives (6-8) does help performance. 
Benchmarks like pgbench and OSDB agree with it, but in reality users 
could not see noticeable change. However, moving the top 5/10 tables and 
indexes to their own space made a greater impact.

Someone who reads PostgreSQL 8.0 Performance Checklist is going to see 
point #1 add more disks is the key. How about adding a subpoint to 
explaining when more disks isn't enough or applicable? I maybe 
generalizing the complexity of tuning an OLTP application, but some 
clarity could help.

Steve Poe


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-25 Thread Steve Poe
>There was some interesting oscillation behavior in both version of 
postgres that occurred with 25 >clients and 1000 transactions at a 
scaling factor of 100. This was repeatable with the distribution 
>version of pgbench run iteratively from the command line. I'm not sure 
how to explain this.

Tom,
When you see these oscillations, do they occur after so many generated 
results? Some oscillation is normal, in my opinion, from 10-15% of the 
performance is noise-related.

The key is to tune the server that you either 1) minimize the 
oscillation and/or 2)increase your overall performance above the 10-15% 
baseline, and 3) find out what the mean and standard deviation between 
all your results.

If your results are within that range, this maybe "normal". I follow-up 
with you later on what I do.

Steve Poe

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-25 Thread Steve Poe
Tom,
Just a quick thought: after each run/sample of pgbench, I drop the 
database and recreate it. When I don't my results become more skewed.

Steve Poe
Thomas F.O'Connell wrote:
Interesting. I should've included standard deviation in my pgbench 
iteration patch. Maybe I'll go back and do that.

I was seeing oscillation across the majority of iterations in the 25 
clients/1000 transaction runs on both database versions.

I've got my box specs and configuration files posted. If you see 
anything obvious about the tuning parameters that should be tweaked, 
please let me know.

Thanks for the feedback!
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
Strategic Open Source: Open Your iâ„¢
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Apr 25, 2005, at 1:58 AM, Steve Poe wrote:
>There was some interesting oscillation behavior in both version of 
postgres that occurred with 25 >clients and 1000 transactions at a 
scaling factor of 100. This was repeatable with the distribution 
>version of pgbench run iteratively from the command line. I'm not 
sure how to explain this.

Tom,
When you see these oscillations, do they occur after so many 
generated results? Some oscillation is normal, in my opinion, from 
10-15% of the performance is noise-related.

The key is to tune the server that you either 1) minimize the 
oscillation and/or 2)increase your overall performance above the 
10-15% baseline, and 3) find out what the mean and standard deviation 
between all your results.

If your results are within that range, this maybe "normal". I 
follow-up with you later on what I do.

Steve Poe



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-26 Thread Steve Poe
Tom,
Honestly, you've got me. It was either comment from Tom Lane or Josh 
that the os is caching the results (I may not be using the right terms 
here), so I thought it the database is dropped and recreated, I would 
see less of a skew (or variation) in the results. Someone which to comment?

Steve Poe
Thomas F.O'Connell wrote:
Considering the default vacuuming behavior, why would this be?
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
Strategic Open Source: Open Your iâ„¢
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Apr 25, 2005, at 12:18 PM, Steve Poe wrote:
Tom,
Just a quick thought: after each run/sample of pgbench, I drop the 
database and recreate it. When I don't my results become more skewed.

Steve Poe


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Final decision

2005-04-27 Thread Steve Poe
Joshua,
This article was in July 2002, so is there update to this information? 
When will a new ODBC driver be available for testing? Is there a release 
of the ODBC driver with better performance than 7.0.3.0200 for a 7.4.x 
database?

Steve Poe
We have mentioned it on the list.
http://www.linuxdevcenter.com/pub/a/linux/2002/07/16/drake.html
Regards, Dave
[and yes, I know Joshua said Command Prompt are rewriting /their/
driver]

:) No we are rewriting a complete OSS driver.
Sincerely,
Joshua D. Drake
Command Prompt, Inc.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] batch inserts are "slow"

2005-05-03 Thread Steve Wampler
Tim Terlegård wrote:
>>
>>Just as on Oracle you would use SQL*Loader for this application, you
>>should use the COPY syntax for PostgreSQL.  You will find it a lot
>>faster.  I have used it by building the input files and executing
>>'psql' with a COPY command, and also by using it with a subprocess,
>>both are quite effective.
> 
> 
> I tried this now. Now it's down to 0.45 seconds. It feels a bit hacky to
> run /usr/bin/psql from java, but it sure works. Thanks for the hint!

There was a patch against 7.4 that provided direct JDBC access to
PostgreSQL's COPY.  (I have it installed here and *love* it - it
gives outstanding performance.)  However, it hasn't made into an
official release yet.  I don't know why, perhaps there's
a problem yet to be solved with it ('works for me', though)?

Is this still on the board?  I won't upgrade past 7.4 without it.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Whence the Opterons?

2005-05-06 Thread Steve Poe
IBM, Sun and HP have their fairly pricey Opteron systems.
The IT people are not swell about unsupported purchases off ebay.

Mischa,
I certainly understand your concern, but the  price and support 
sometimes go hand-in-hand. You may have to pick your batttles if your 
want more bang for the buck or more support. I might be wrong on this, 
but not everything you buy on E-Bay is unsupported.

We purchase a dual Operton from Sun off their E-Bay store  for about $3K 
less than the "buy it now" price.

From an IT perspective, support is not as critical if I can do it 
myself. If it is for business 24/7 operations, then the company should 
be able to put some money behind what they want to put their business 
on. Your mileage may vary.

Steve

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] Postgresql Performance via the LSI MegaRAID 2x Card

2005-05-13 Thread Steve Poe
Past recommendations for a good RAID card (for SCSI) have been the LSI 
MegaRAID 2x. This unit comes with 128MB of RAM on-board. Has anyone 
found by increasing the on-board RAM, did Postgresql performed better?

Thanks.
Steve Poe
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL strugling during high load

2005-05-19 Thread Steve Poe
Josh Berkus wrote:

>Anjan,
>
>  
>
>>As far as disk I/O is concerned for flushing the buffers out, I am not
>>ruling out the combination of Dell PERC4 RAID card, and the RH AS 3.0
>>Update3 being a problem.
>>
>>
>
>You know that Update4 is out, yes?  
>Update3 is currenly throttling your I/O by about 50%.
>  
>

Is that 50% just for the Dell PERC4 RAID on RH AS 3.0?  Sound like
severe context switching.

Steve Poe


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Optimizing for writes. Data integrity not critical

2005-05-19 Thread Steve Bergman
Hi,
I am using postgresql in small (almost trivial) application in which I 
pull some data out of a Cobol C/ISAM file and write it into a pgsl 
table.  My users can then use the data however they want by interfacing 
to the data from OpenOffice.org.

The amount of data written is about 60MB and takes a few minutes on a 
1200Mhz Athlon with a single 60MB IDE drive running Fedora Core 3 with 
pgsql 7.4.7.  I'd like to speed up the DB writes a bit if possible.  
Data integrity is not at all critical as the database gets dropped, 
created, and populated immediately before each use.  Filesystem is ext3, 
data=ordered and I need to keep it that way as there is other data in 
the filesystem that I do care about.  I have not done any tuning in the 
config file yet, and was wondering what things would likely speed up 
writes in this situation.

I'm doing the writes individually.  Is there a better way?  Combining 
them all into a transaction or something?

Thanks,
Steve Bergman
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] sequential scan performance

2005-05-30 Thread Steve Atkins
On Sun, May 29, 2005 at 08:27:26AM -0500, Michael Engelhart wrote:
> Hi -
> 
> I have a table of about 3 million rows of city "aliases" that I need  
> to query using LIKE - for example:
> 
> select * from city_alias where city_name like '%FRANCISCO'
> 
> 
> When I do an EXPLAIN ANALYZE on the above query, the result is:
> 
>  Seq Scan on city_alias  (cost=0.00..59282.31 rows=2 width=42)  
> (actual time=73.369..3330.281 rows=407 loops=1)
>Filter: ((name)::text ~~ '%FRANCISCO'::text)
> Total runtime: 3330.524 ms
> (3 rows)
> 
> 
> this is a query that our system needs to do a LOT.   Is there any way  
> to improve the performance on this either with changes to our query  
> or by configuring the database deployment?   We have an index on  
> city_name but when using the % operator on the front of the query  
> string postgresql can't use the index .

If that's really what you're doing (the wildcard is always at the beginning)
then something like this

  create index city_name_idx on foo (reverse(city_name));

  select * from city_alias where reverse(city_name) like reverse('%FRANCISCO');

should do just what you need.

I use this, with a plpgsql implementation of reverse, and it works nicely.

CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS '
DECLARE
   original alias for $1;
   reverse_str text;
   i int4;
BEGIN
 reverse_str = '''';
 FOR i IN REVERSE LENGTH(original)..1 LOOP
  reverse_str = reverse_str || substr(original,i,1);
 END LOOP;
 return reverse_str;
END;'
LANGUAGE 'plpgsql' IMMUTABLE;


Someone will no doubt suggest using tsearch2, and you might want to
take a look at it if you actually need full-text search, but my
experience has been that it's too slow to be useful in production, and
it's not needed for the simple "leading wildcard" case.

Cheers,
  Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Postgresql and Software RAID/LVM

2005-06-03 Thread Steve Poe
I have a small business client that cannot afford high-end/high quality
RAID cards for their next server. That's a seperate argument/issue right
there for me, but what the client wants is what the client wants.

Has anyone ran Postgres with software RAID or LVM on a production box?
What have been your experience?

I don't forsee more 10-15 concurrent sessions running for an their OLTP
application.

Thanks.

Steve Poe


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Importing from pg_dump slow, low Disk IO

2005-06-08 Thread Steve Pollard

Hi Everyone,

Im having a performance issue with version 7.3.4 which i first thought was Disk 
IO
related, however now it seems like the problem is caused by really slow 
commits, this
is running on Redhat 8.

Basically im taking a .sql file with insert of about 15,000 lines and <'ing 
straight
into psql DATABASENAME, the Disk writes never gets over about 2000 on this 
machine
with a RAID5 SCSI setup, this happens in my PROD and DEV environment.

Ive installed the latest version on RedHat ES3 and copied the configs across 
however
the inserts are really really fast..

Was there a performce change from 7.3.4 to current to turn of autocommits by 
default
or is buffering handled differently ?

I have ruled out Disk IO issues as a siple 'cp' exceeds Disk writes to 6 
(using vmstat)

If i do this with a BEGIN; and COMMIT; its really fast, however not practical 
as im setting
up a cold-standby server for automation.

Have been trying to debug for a few days now and see nothing.. here is some 
info :

::
/proc/sys/kernel/shmall
::
2097152
::
/proc/sys/kernel/shmmax
::
134217728
::
/proc/sys/kernel/shmmni
::
4096


shared_buffers = 51200
max_fsm_relations = 1000
max_fsm_pages = 1
max_locks_per_transaction = 64
wal_buffers = 64
effective_cache_size = 65536

MemTotal:  1547608 kB
MemFree: 47076 kB
MemShared:   0 kB
Buffers:134084 kB
Cached:1186596 kB
SwapCached:544 kB
Active: 357048 kB
ActiveAnon: 105832 kB
ActiveCache:251216 kB
Inact_dirty:321020 kB
Inact_laundry:  719492 kB
Inact_clean: 28956 kB
Inact_target:   285300 kB
HighTotal:  655336 kB
HighFree: 1024 kB
LowTotal:   892272 kB
LowFree: 46052 kB
SwapTotal: 1534056 kB
SwapFree:  1526460 kB

This is a real doosey for me, please provide any advise possible.

Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Importing from pg_dump slow, low Disk IO

2005-06-08 Thread Steve Pollard
As a follow up to this ive installed on another test Rehat 8 machine
with
7.3.4 and slow inserts are present, however on another machine with ES3
the same 15,000 inserts is about 20 times faster, anyone know of a
change
that would effect this, kernel or rehat release ?

Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve
Pollard
Sent: Wednesday, 8 June 2005 6:39 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Importing from pg_dump slow, low Disk IO


Hi Everyone,

Im having a performance issue with version 7.3.4 which i first thought
was Disk IO related, however now it seems like the problem is caused by
really slow commits, this is running on Redhat 8.

Basically im taking a .sql file with insert of about 15,000 lines and
<'ing straight into psql DATABASENAME, the Disk writes never gets over
about 2000 on this machine with a RAID5 SCSI setup, this happens in my
PROD and DEV environment.

Ive installed the latest version on RedHat ES3 and copied the configs
across however the inserts are really really fast..

Was there a performce change from 7.3.4 to current to turn of
autocommits by default or is buffering handled differently ?

I have ruled out Disk IO issues as a siple 'cp' exceeds Disk writes to
6 (using vmstat)

If i do this with a BEGIN; and COMMIT; its really fast, however not
practical as im setting up a cold-standby server for automation.

Have been trying to debug for a few days now and see nothing.. here is
some info :

::
/proc/sys/kernel/shmall
::
2097152
::
/proc/sys/kernel/shmmax
::
134217728
::
/proc/sys/kernel/shmmni
::
4096


shared_buffers = 51200
max_fsm_relations = 1000
max_fsm_pages = 1
max_locks_per_transaction = 64
wal_buffers = 64
effective_cache_size = 65536

MemTotal:  1547608 kB
MemFree: 47076 kB
MemShared:   0 kB
Buffers:134084 kB
Cached:1186596 kB
SwapCached:544 kB
Active: 357048 kB
ActiveAnon: 105832 kB
ActiveCache:251216 kB
Inact_dirty:321020 kB
Inact_laundry:  719492 kB
Inact_clean: 28956 kB
Inact_target:   285300 kB
HighTotal:  655336 kB
HighFree: 1024 kB
LowTotal:   892272 kB
LowFree: 46052 kB
SwapTotal: 1534056 kB
SwapFree:  1526460 kB

This is a real doosey for me, please provide any advise possible.

Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Importing from pg_dump slow, low Disk IO

2005-06-09 Thread Steve Pollard
Hi All,

Not sure if this is correct fix or not, but a bit of research :
http://archives.postgresql.org/pgsql-hackers/2001-04/msg01129.php
And offical doco's from postgres :
http://www.postgresql.org/docs/7.4/static/wal-configuration.html
Lead me to try :
wal_sync_method = open_sync
And this has increased the speed on my Redhat 8 servers my 20X !

Steve
-Original Message-
From: Steve Pollard 
Sent: Thursday, 9 June 2005 1:27 PM
To: Steve Pollard; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Importing from pg_dump slow, low Disk IO

As a follow up to this ive installed on another test Rehat 8 machine
with
7.3.4 and slow inserts are present, however on another machine with ES3
the same 15,000 inserts is about 20 times faster, anyone know of a
change that would effect this, kernel or rehat release ?

Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve
Pollard
Sent: Wednesday, 8 June 2005 6:39 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Importing from pg_dump slow, low Disk IO


Hi Everyone,

Im having a performance issue with version 7.3.4 which i first thought
was Disk IO related, however now it seems like the problem is caused by
really slow commits, this is running on Redhat 8.

Basically im taking a .sql file with insert of about 15,000 lines and
<'ing straight into psql DATABASENAME, the Disk writes never gets over
about 2000 on this machine with a RAID5 SCSI setup, this happens in my
PROD and DEV environment.

Ive installed the latest version on RedHat ES3 and copied the configs
across however the inserts are really really fast..

Was there a performce change from 7.3.4 to current to turn of
autocommits by default or is buffering handled differently ?

I have ruled out Disk IO issues as a siple 'cp' exceeds Disk writes to
6 (using vmstat)

If i do this with a BEGIN; and COMMIT; its really fast, however not
practical as im setting up a cold-standby server for automation.

Have been trying to debug for a few days now and see nothing.. here is
some info :

::
/proc/sys/kernel/shmall
::
2097152
::
/proc/sys/kernel/shmmax
::
134217728
::
/proc/sys/kernel/shmmni
::
4096


shared_buffers = 51200
max_fsm_relations = 1000
max_fsm_pages = 1
max_locks_per_transaction = 64
wal_buffers = 64
effective_cache_size = 65536

MemTotal:  1547608 kB
MemFree: 47076 kB
MemShared:   0 kB
Buffers:134084 kB
Cached:1186596 kB
SwapCached:544 kB
Active: 357048 kB
ActiveAnon: 105832 kB
ActiveCache:251216 kB
Inact_dirty:321020 kB
Inact_laundry:  719492 kB
Inact_clean: 28956 kB
Inact_target:   285300 kB
HighTotal:  655336 kB
HighFree: 1024 kB
LowTotal:   892272 kB
LowFree: 46052 kB
SwapTotal: 1534056 kB
SwapFree:  1526460 kB

This is a real doosey for me, please provide any advise possible.

Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] faster search

2005-06-10 Thread Steve Atkins
On Fri, Jun 10, 2005 at 01:45:05PM -0400, Clark Slater wrote:
> Hi-
> 
> Would someone please enlighten me as
> to why I'm not seeing a faster execution
> time on the simple scenario below?

Because you need to extract a huge number of rows via a seqscan, sort
them and then throw them away, I think.

> explain analyze select * from test where productlistid=3 and typeid=9 
> order by partnumber limit 15;

Create an index on (productlistid, typeid, partnumber) then

  select * from test where productlistid=3 and typeid=9
   order by productlistid, typeid, partnumber LIMIT 15;

?

Cheers,
  Steve

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Planner constants for RAM resident databases

2005-07-01 Thread Steve Atkins
On Fri, Jul 01, 2005 at 09:59:38PM -0400, Emil Briggs wrote:

> I'm working with an application where the database is entirely resident in 
> RAM 
> (the server is a quad opteron with 16GBytes of memory). It's a web 
> application and handles a high volume of queries. The planner seems to be 
> generating poor  plans for some of our queries which I can fix by raising 
> cpu_tuple_cost. I have seen some other comments in the archives saying that 
> this is a bad idea  but is that necessarily the case when the database is 
> entirely resident in RAM?

If I'm understanding correctly that'll mostly increase the estimated
cost of handling a row relative to a sequential page fetch, which
sure sounds like it'll push plans in the right direction, but it
doesn't sound like the right knob to twiddle.

What do you have random_page_cost set to?

Cheers,
  Steve

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Table locking problems?

2005-08-09 Thread Steve Poe
Dan,

Do you mean you did RAID 1 + 0 (RAID 10) or RAID 0 + 1? Just a
clarification, since RAID 0 is still a single-point of failure even if
RAID1 is on top of RAID0. 

How many users are connected when your update / delete queries are
hanging? Have you done an analyze verbose on those queries?

Have you made changes to the postgresql.conf? kernel.vm settings? IO
scheduler?

If you're not doing so already, you may consider running sar (iostat) to
monitor when the hanging occurs if their is a memory / IO bottleneck
somewhere.

Good luck.

Steve Poe


On Tue, 2005-08-09 at 12:04 -0600, Dan Harris wrote:
> I thought I would send this to pg-performance since so many people  
> helped me with my speed issues recently.  I was definitely IO- 
> bottlenecked.
> 
> Since then, I have installed 2 RAID arrays with 7 15k drives in them  
> in RAID 0+1 as well as add a new controller card with 512MB of cache  
> on it.  I also created this new partition on the RAID as XFS instead  
> of ext3.
> 
> These changes have definitely improved performance, but I am now  
> finding some trouble with UPDATE or DELETE queries "hanging" and  
> never releasing their locks.  As this happens, other statements queue  
> up behind it.  It seems to occur at times of very high loads on the  
> box.  Is my only option to kill the query ( which usually takes down  
> the whole postmaster with it! ouch ).
> 
> Could these locking issues be related to the other changes I made?  
> I'm really scared that this is related to choosing XFS, but I sure  
> hope not.   How should I go about troubleshooting the "problem"  
> queries?  They don't seem to be specific to a single table or single  
> database.
> 
> I'm running 8.0.1 on kernel 2.6.12-3 on 64-bit Opterons if that  
> matters..
> 
> 
> -Dan
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[SPAM?] Re: [PERFORM] PG8 Tuning

2005-08-11 Thread Steve Poe
Paul,

Before I say anything else, one online document which may be of
assistance to you is:
http://www.powerpostgresql.com/PerfList/

Some thoughts I have:

3) You're shared RAM setting seems overkill to me. Part of the challenge
is you're going from 1000 to 262K with no assessment in between. Each
situation can be different, but try in the range of 10 - 50K.

4) pg_xlog: If you're pg_xlog is on a spindle is *only* for pg_xlog
you're better off. If it is sharing with any other OS/DB resource, the
performance will be impacted.

>From what I have learned from others on this list, RAID5 is not the best
choice for the database. RAID10 would be a better solution (using 8 of
your disks) then take the remaining disk and do mirror with your pg_xlog
if possible.

Best of luck,

Steve Poe

On Thu, 2005-08-11 at 13:23 +0100, Paul Johnson wrote:
> Hi all, we're running PG8 on a Sun V250 with 8GB RAM and 2*1.3GHz SPARC
> CPUs running Solaris 10. The DB cluster is on an external fibre-attached
> Sun T3 array that has 9*36GB drives configured as a single RAID5 LUN.
> 
> The system is for the sole use of a couple of data warehouse developers,
> hence we are keen to use 'aggressive' tuning options to maximise
> performance.
> 
> So far we have made the following changes and measured the impact on our
> test suite:
> 
> 1) Increase checkpoint_segments from 3 to 64. This made a 10x improvement
> in some cases.
> 
> 2) Increase work_mem from 1,024 to 524,288.
> 
> 3) Increase shared_buffers from 1,000 to 262,143 (2 GB). This required
> setting SHMMAX=4294967295 (4 GB) in /etc/system and re-booting the box.
> 
> Question - can Postgres only use 2GB RAM, given that shared_buffers can
> only be set as high as 262,143 (8K pages)?
> 
> So far so good...
> 
> 4) Move /pg_xlog to an internal disk within the V250. This has had a
> severe *negative* impact on performance. Copy job has gone from 2 mins to
> 12 mins, simple SQL job gone from 1 min to 7 mins. Not even run long SQL
> jobs.
> 
> I'm guessing that this is because pg_xlog has gone from a 9 spindle LUN to
> a single spindle disk?
> 
> In cases such as this, where an external storage array with a hardware
> RAID controller is used, the normal advice to separate the data from the
> pg_xlog  seems to come unstuck, or are we missing something?
> 
> Cheers,
> 
> Paul Johnson.
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Read/Write block sizes

2005-08-23 Thread Steve Poe
Chris,

Unless I am wrong, you're making the assumpting the amount of time spent
and ROI is known. Maybe those who've been down this path know how to get
that additional 2-4% in 30 minutes or less? 

While each person and business' performance gains (or not) could vary,
someone spending the 50-100h to gain 2-4% over a course of a month for a
24x7 operation would seem worth the investment?  

I would assume that dbt2 with STP helps minimize the amount of hours
someone has to invest to determine performance gains with configurable
options?   

Steve Poe

> If someone spends 100h working on one of these items, and gets a 2%
> performance improvement, that's almost certain to be less desirable
> than spending 50h on something else that gets a 4% improvement.
> 
> And we might discover that memory management improvements in Linux
> 2.6.16 or FreeBSD 5.5 allow some OS kernels to provide some such
> improvements "for free" behind our backs without *any* need to write
> database code.  :-)


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] What *_mem to increase when running CLUSTER

2005-08-25 Thread Steve Poe
Andrew,

On Thu, 2005-08-25 at 12:24 -0700, Andrew Lazarus wrote:
> Should I temporarily increase sort_mem, vacuum_mem, neither, or both 
> when doing a CLUSTER on a large (100 million row) table where as many as 
> half of the tuples are deadwood from UPDATEs or DELETEs? I have large 
> batch (10 million row) inserts, updates, and deletes so I'm not sure 
> frequent vacuuming would help.

You may need to experiment with both. What version of Postgres? What is
the size of your database? How many concurrent users? If you're seeing
half of the tuples are dead, I look at checking your max_fsm_pages and
max_fsm_relations after a full vacuum analyze before doing too much with
sort mem.

Your mileage may vary.

Best of luck.

Steve Poe


> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] How can this be?

2005-09-19 Thread Steve Atkins
On Fri, Sep 16, 2005 at 08:34:14PM -0500, Martin Nickel wrote:

> Hello all,
> Mostly Postgres makes sense to me.  But now and then it does something
> that  boggles my brain.  Take the statements below.  I have a table
> (agent) with 5300 rows.  The primary key is agent_id.  I can do SELECT
> agent_id FROM agent and it returns all PK values in less than half a
> second (dual Opteron box, 4G ram, SATA Raid 10 drive system).
> 
> But when I do a DELETE on two rows with an IN statement, using the primary
> key index (as stated by EXPLAIN) it take almost 4 minutes.
> pg_stat_activity shows nine other connections, all idle.
> 
> If someone can explain this to me it will help restore my general faith in
> order and consistancy in the universe.

When you delete a row from agent PG needs to find any matching rows in
office. Is office large? Is office(office_id) indexed?

> -- Executing query:
> DELETE FROM agent WHERE agent_id IN (15395, 15394);
> Query returned successfully: 2 rows affected, 224092 ms execution time.
> 
> -- Executing query:
> EXPLAIN DELETE FROM agent WHERE agent_id IN (15395, 15394);
> Index Scan using agent2_pkey, agent2_pkey on agent  (cost=0.00..7.27
> rows=2 width=6)
> Index Cond: ((agent_id = 15395) OR (agent_id = 15394))
> 
> Here's my table
> CREATE TABLE agent
> (
>   agent_id int4 NOT NULL DEFAULT nextval('agent_id_seq'::text),
>   office_id int4 NOT NULL,
>   lastname varchar(25),
>   firstname varchar(25),
> ...other columns...  
>   CONSTRAINT agent2_pkey PRIMARY KEY (agent_id),
>   CONSTRAINT agent_office_fk FOREIGN KEY (office_id) REFERENCES office 
> (office_id) ON UPDATE RESTRICT ON DELETE RESTRICT
> ) 
> WITHOUT OIDS;

Cheers,
  Steve

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Alternative to a temporary table

2005-10-03 Thread Steve Atkins
On Mon, Oct 03, 2005 at 11:47:52AM -0400, Steven Rosenstein wrote:

> I currently create a temporary table to hold the selected server_id's and
> characteristics.  I then join this temp table with other data tables to
> produce my reports.  My reason for using the temporary table method is that
> the SQL for the initial server selection is dynamically created based on
> the user's selections, and is complex enough that it does not lend itself
> to being easily incorporated into any of the other data extraction queries
> (which may also contain dynamic filtering).
> 
> Unfortunately, the PHP connection to the database does not survive from
> webscreen to webscreen, so I have to re-generate the temporary server_id
> table each time it is needed for a report screen.  An idea I had to make
> this process more efficient was instead of re-creating the temporary table
> over and over each time it is needed, do a one-time extraction of the list
> of user-selected server_id's, store the list in a PHP global variable, and
> then use the list in a dynamically-created WHERE clause in the rest of the
> queries.  The resulting query would look something like
> 
>  SELECT *
>  FROM some_data_table
>  WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...)
> 
> Simple enough, however in rare cases the list of server_id's can range
> between 6,000 and 10,000.
> 
> My question to the group is, can having so many values in a WHERE/IN clause
> effect query performance? 

Probably, yes. As always, benchmark a test case, but last time I
checked (in 7.4) you'd be better creating a new temporary table for
every query than use an IN list that long. It's a lot better in 8.0, I
believe, so you should benchmark it there.

> Am I being naive about this and is there a
> different, better way?  The server_id field is of course indexed, but it is
> possible that the list of selected sid's can contain almost all of the
> values in the some_data_table server_id index (in the situation where _all_
> of the records are requested I wouldn't use the WHERE clause in the query).
> The some_data_table can contain millions of records for thousands of
> servers, so every bit of efficiency helps.

Don't use a temporary table. Instead use a permanent table that
contains the server ids you need and the PHP session token. Then you
can create your list of server_ids once and insert it into that table
associated with your sessionid. Then future queries can be simple
joins against that table.

SELECT some_data_table.*
FROM some_data_table, session_table
WHERE some_data_table.server_id = session_table.server_id
  AND session_table.session_id = 'foobar'

You'd need a reaper process to delete old data from that table to
prevent it from growing without limit, and probably a table associating
session start time with sessionid to make reaping easier.

Cheers,
  Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Steve Atkins
On Tue, Oct 04, 2005 at 11:06:54PM -0400, Ron Peacetree wrote:

> Some might even argue that IBM (where Codd and Date worked)
> and Oracle just _might_ have had justification for the huge effort
> they put into developing such infrastructure. 

The OS and FS world is very, very different now than it was when
the Oracle and DB2 architectures were being crafted. What may have
been an excellent development effort then may not provide such good
ROI now.

> Then there's the large library of research on caching strategies
> in just about every HW and SW domain, including DB theory,
> that points put that the more context dependent, ie application
> or domain specific awareness, caching strategies are the better
> they are.
> 
> Maybe after we do all we can about physical IO and sorting
> performance I'll take on the religious fanatics on this one.

Actually, the main "religious fanatic" I've seen recently is yourself.
While I have a gut feel that some of the issues you raise could
certainly do with further investigation, I'm not seeing that much from
you other than statements that muchof what postgresql does is wrong
(not "wrong for your Ron's use case", but "wrong in every respect").

A little less arrogance and a little more "here are some possibilities
for improvement", "here is an estimate of the amount of effort that
might be needed" and "here are some rough benchmarks showing the
potential return on that investment" would, at the very least, make
the threads far less grating to read.

Cheers,
  Steve

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] XFS External Log on Pg 7.4.8 Pg_xlog drives?

2005-10-10 Thread Steve Poe
Fellow Postgresql users,

I have a Pg 7.4.8 data on XFS RAID10 6-disc (U320 SCSI on LSI MegaRAID
w/bbu). The pg_xlog is on its own RAID1 with nothing else.

I don't have room for more drives, but I am considering moving the XFS
external log of the data directory to the RAID1 where the pg_xlog
exists. Unfortunately, I don't have room on the RAID1 that the OS exists
on(Centos Linux 4.1).

Anyone have any experience moving the XFS log to the pg_xlog? The
guessing the the benefit / cost will cancel each other out. 

Thanks.

Steve Poe


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Help tuning postgres

2005-10-12 Thread Steve Poe

Would it not be faster to do a dump/reload of the table than reindex or
is it about the same? 

Steve Poe

On Wed, 2005-10-12 at 13:21 -0400, Tom Lane wrote:
> Emil Briggs <[EMAIL PROTECTED]> writes:
> >> Not yet, the db is in production use and I have to plan for a down-time
> >> for that... or is it not impacting the activity on the table ?
> 
> > It will cause some performance hit while you are doing it.
> 
> It'll also lock out writes on the table until the index is rebuilt,
> so he does need to schedule downtime.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best way to get all different values in a column

2005-10-14 Thread Steve Atkins
On Fri, Oct 14, 2005 at 06:02:56PM +0200, [EMAIL PROTECTED] wrote:
> Ok, since my question got no answer on the general list, I'm reposting
> it here since this list seems in fact better suited to it.
>  
> Does anyone here know what  is  the most efficient way to  list all
> different values of a given column with low cardinality ?  For instance
> I have a table with columns DAY, NAME, ID, etc.  The table is updated
> about each week with thousands of records with the same (current) date.
> Now I would like to list all values for DAY, only if possible without
> scanning all the table each time I submit the request.
>  
> I can think of:
>  
> Solution 1: SELECT DAY FROM TABLE GROUP BY DAY;
>  
> Solution 2: SELECT DISTINCT DAY FROM TABLE;
>  
> (BTW why do those two yield such different performances, the later being
> seemingly *much* slower than the former  ?)
>  
> Solution 3: Improve performance through an index scan by using DAY as
> the first element of the PK,  (PRIMARY KEY (DAY, ID) ), although DAY has
> a low cardinality ?
>  
> Solution 4: Create a separate index on column DAY ?
>  
> Solution 5: Use some kind of view / stored procedure that would be
> precomputed when TABLE is updated or cached when called for the first
> time ? Does something like that exist ?
>  
> Solution 6: Store the values in a separate table, recreated each time
> TABLE is updated.
>  
> This looks to me as a very common problem. Is there an obvious / best /
> standard solution there ? What would be the expected performance of the
> different solutions above ? (I guess some are probably non-sense)
>  

There's not going to be a single "best" solution, as it'll depend on
your requirements, and on your application level constraints.

You say that the table is seldom updated (a few thousand a week is "almost
never"). If it's updated in a single batch you could simply generate
a table of the distinct values after each update pretty easily (solution
6).

If you don't have such a well-defined update then using a trigger on
inserts, updates and deletes of the table to update a separate table
to keep track of the counts of each distinct values, then you can
just select any row with a non-zero count from that table (solution 5).
(You need the counts to be able to deal with deletes efficiently). That
would increase the cost of updating the main table significantly, but
you're putting very little traffic through it, so that's unlikely to
be a problem.

I doubt that solutions 3 or 4 are worth looking at at all, and the first
two are what they are and you know their performance already.

You could probably do this far more efficiently with some of the work
being done in the application layer, rather than in the database - for
instance you could update the counts table one time per transaction,
rather than one time per operation - but that would lose you the
convenience of maintaining the counts correctly when you futz with
the data manually or using tools not aware of the count table.

Cheers,
  Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Help speeding up delete

2005-11-14 Thread Steve Wampler

We've got an older system in production (PG 7.2.4).  Recently
one of the users has wanted to implement a selective delete,
but is finding that the time it appears to take exceeds her
patience factor by several orders of magnitude.  Here's
a synopsis of her report.  It appears that the "WHERE
id IN ..." is resulting in a seq scan that is causing
the problem, but we're not SQL expert enough to know
what to do about it.

Can someone point out what we're doing wrong, or how we
could get a (much) faster delete?  Thanks!

Report:

This command yields results in only a few seconds:

# SELECT at.id FROM "tmp_table2" at, "tmp_tabl2e" a
#   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505';

However, the following command does not seen to want to ever
complete (the person running this killed it after 1/2 hour).

# DELETE FROM "tmp_table2" WHERE id IN
# (SELECT at.id FROM "tmp_table2" at, "tmp_table2" a
#   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505');

==

The table has four columns. There are 6175 rows satifying the condition
given, and the table itself has 1539688 entries.  Layout is:

lab.devel.configdb=# \d tmp_table2
  Table "tmp_table2"
 Column |   Type   | Modifiers
+--+---
 id | character varying(64)|
 name   | character varying(64)|
 units  | character varying(32)|
 value  | text |
 time   | timestamp with time zone |

==

lab.devel.configdb=# EXPLAIN DELETE FROM "tmp_table2" WHERE id IN
lab.devel.configdb-#   (SELECT at.id FROM "tmp_table2" at, "tmp_table2" a
lab.devel.configdb(# WHERE at.id=a.id AND a.name='obsid' AND 
a.value='oid080505');
NOTICE:  QUERY PLAN:

Seq Scan on tmp_table2  (cost=0.00..154893452082.10 rows=769844 width=6)
  SubPlan
->  Materialize  (cost=100600.52..100600.52 rows=296330 width=100)
  ->  Hash Join  (cost=42674.42..100600.52 rows=296330 width=100)
->  Seq Scan on tmp_table2 at  (cost=0.00..34975.88 
rows=1539688 width=50)
->  Hash  (cost=42674.32..42674.32 rows=38 width=50)
  ->  Seq Scan on tmp_table2 a  (cost=0.00..42674.32 
rows=38 width=50)
EXPLAIN

lab.devel.configdb=# EXPLAIN  (SELECT at.id FROM "tmp_table2" at, "tmp_table2" a
lab.devel.configdb(#WHERE at.id=a.id AND a.name='obsid' AND 
a.value='oid080505');
NOTICE:  QUERY PLAN:

Hash Join  (cost=42674.42..100600.52 rows=296330 width=100)
  ->  Seq Scan on tmp_table2 at  (cost=0.00..34975.88 rows=1539688 width=50)
  ->  Hash  (cost=42674.32..42674.32 rows=38 width=50)
->  Seq Scan on tmp_table2 a  (cost=0.00..42674.32 rows=38 width=50)

EXPLAIN

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Help speeding up delete

2005-11-14 Thread Steve Wampler
Scott Lamb wrote:
> On Nov 14, 2005, at 2:07 PM, Steve Wampler wrote:
> 
>> # SELECT at.id FROM "tmp_table2" at, "tmp_tabl2e" a
>> #   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505';
> 
> 
> Isn't this equivalent?
> 
> select id from tmp_table2 where name = 'obsid' and value = 'oid080505';

Probably, the user based the above on a query designed to find
all rows with the same id as those rows that have a.name='obsid' and
a.value='oid080505'.  However, I think the above would work to locate
all the ids, which is all we need for the delete (see below)

>> # DELETE FROM "tmp_table2" WHERE id IN
>> # (SELECT at.id FROM "tmp_table2" at, "tmp_table2" a
>> #   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505');
> 
> 
> and this?
> 
> delete from tmp_table2 where name = 'obsid' and value = 'oid080505';
> 
> Why are you doing a self-join using id, which I assume is a primary key?

Because I think we need to.  The above would only delete rows that have
name = 'obsid' and value = 'oid080505'.  We need to delete all rows that
have the same ids as those rows.  However, from what you note, I bet
we could do:

   DELETE FROM "tmp_table2" WHERE id IN
  (SELECT id FROM "temp_table2" WHERE name = 'obsid' and value= 
'oid080505');

However, even that seems to have a much higher cost than I'd expect:

   lab.devel.configdb=# explain delete from "tmp_table2" where id in
(select id from tmp_table2 where name='obsid' and value = 'oid080505');
   NOTICE:  QUERY PLAN:

   Seq Scan on tmp_table2  (cost=0.00..65705177237.26 rows=769844 width=6)
 SubPlan
   ->  Materialize  (cost=42674.32..42674.32 rows=38 width=50)
 ->  Seq Scan on tmp_table2  (cost=0.00..42674.32 rows=38 width=50)

   EXPLAIN

And, sure enough, is taking an extrordinarily long time to run (more than
10 minutes so far, compared to < 10seconds for the select).  Is this
really typical of deletes?  It appears (to me) to be the Seq Scan on tmp_table2
that is the killer here.  If we put an index on, would it help?  (The user
claims she tried that and it's EXPLAIN cost went even higher, but I haven't
checked that...)

Thanks!
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Help speeding up delete

2005-11-14 Thread Steve Wampler
Tom Lane wrote:
> Steve Wampler <[EMAIL PROTECTED]> writes:
> 
>>We've got an older system in production (PG 7.2.4).  Recently
>>one of the users has wanted to implement a selective delete,
>>but is finding that the time it appears to take exceeds her
>>patience factor by several orders of magnitude.  Here's
>>a synopsis of her report.  It appears that the "WHERE
>>id IN ..." is resulting in a seq scan that is causing
>>the problem, but we're not SQL expert enough to know
>>what to do about it.
> 
> 
>>Can someone point out what we're doing wrong, or how we
>>could get a (much) faster delete?  Thanks!
> 
> 
> Update to 7.4 or later ;-)

I was afraid you'd say that :-)  I'm not officially involved in
this project anymore and was hoping for a fix that wouldn't drag
me back in.  The security issues aren't a concern because this
DB is *well* hidden from the outside world (it's part of a telescope
control system behind several firewalls with no outside access).
However, the data-loss-grade bugs issue *is* important.  We'll
try to do the upgrade as soon as we get some cloudy days to
actually do it!

Is the performance behavior that we're experiencing a known
problem with 7.2 that has been addressed in 7.4?  Or will the
upgrade fix other problems while leaving this one?

> Quite seriously, if you're still using 7.2.4 for production purposes
> you could justifiably be accused of negligence.  There are three or four
> data-loss-grade bugs fixed in the later 7.2.x releases, not to mention
> security holes; and that was before we abandoned support for 7.2.
> You *really* need to be thinking about an update.

Thanks!
Steve
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Help speeding up delete

2005-11-14 Thread Steve Wampler
Joshua Marsh wrote:
> 
> 
> On 11/14/05, *Steve Wampler* <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
> 
> However, even that seems to have a much higher cost than I'd expect:
> 
>lab.devel.configdb=# explain delete from "tmp_table2" where id in
> (select id from tmp_table2 where name='obsid' and value =
> 'oid080505');
>NOTICE:  QUERY PLAN:
> 
>Seq Scan on tmp_table2  (cost=0.00..65705177237.26 rows=769844
> width=6)
>  SubPlan
>->  Materialize  (cost=42674.32..42674.32 rows=38 width=50)
>  ->  Seq Scan on tmp_table2  (cost=0.00..42674.32
> rows=38 width=50)
> 
> 
> For one reason or the other, the planner things a sequential scan is the
> best solution. Try turning off seq_scan before the query and see if it
> changes the plan (set enable_seqscan off;). 
> 
> I've seen this problem with sub queries and that usually solves it.
> 

Hmmm, not only does it still use sequential scans, it thinks it'll take
even longer:

   set enable_seqscan to off;
   SET VARIABLE
   explain delete from "tmp_table2" where id in
 (select id from tmp_table2 where name='obsid' and value = 'oid080505');
   NOTICE:  QUERY PLAN:

   Seq Scan on tmp_table2  (cost=1.00..160237039405992.50 rows=800836 
width=6)
 SubPlan
   ->  Materialize  (cost=100043604.06..100043604.06 rows=45 width=26)
 ->  Seq Scan on tmp_table2  (cost=1.00..100043604.06 
rows=45 width=26)

   EXPLAIN

But the advice sounds like it *should* have helped...

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Help speeding up delete

2005-11-14 Thread Steve Wampler
Scott Lamb wrote:
> On Nov 14, 2005, at 3:52 PM, Steve Wampler wrote:
> 
>> Scott Lamb wrote:
>>
>>> On Nov 14, 2005, at 2:07 PM, Steve Wampler wrote:
>>>
>>>> # SELECT at.id FROM "tmp_table2" at, "tmp_tabl2e" a
>>>> #   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505';
>>>
>>>
>>>
>>> Isn't this equivalent?
>>>
>>> select id from tmp_table2 where name = 'obsid' and value =  'oid080505';
>>
>>
>> Probably, the user based the above on a query designed to find
>> all rows with the same id as those rows that have a.name='obsid' and
>> a.value='oid080505'.
> 
> 
> Well, this indirection is only significant if those two sets can 
> differ. If (A) you meant "tmp_table2" when you wrote "tmp_tabl2e", so 
> this is a self-join, and (B) there is a primary key on "id", I don't 
> think that can ever happen.

I wasn't clear.  The original query was:

   SELECT at.* FROM "tmp_table2" at, "tmp_table2" a
   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505';

which is significantly different than:

   SELECT * FROM "tmp_table2" WHERE name='obsid' and value='oid080505';

The user had adapted that query for her needs, but it would have been
better to just use the query that you suggested (as the subselect in
the DELETE FROM...).  Unfortunately, that only improves performance
slightly - it is still way too slow on deletes.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Help speeding up delete

2005-11-15 Thread Steve Wampler
Magnus Hagander wrote:
>>Because I think we need to.  The above would only delete rows 
>>that have name = 'obsid' and value = 'oid080505'.  We need to 
>>delete all rows that have the same ids as those rows.  
>>However, from what you note, I bet we could do:
>>
>>   DELETE FROM "tmp_table2" WHERE id IN
>>  (SELECT id FROM "temp_table2" WHERE name = 'obsid' and 
>>value= 'oid080505');
>>
>>However, even that seems to have a much higher cost than I'd expect:
>>
>>   lab.devel.configdb=# explain delete from "tmp_table2" where id in
>>(select id from tmp_table2 where name='obsid' and 
>>value = 'oid080505');
>>   NOTICE:  QUERY PLAN:
>>
>>   Seq Scan on tmp_table2  (cost=0.00..65705177237.26 
>>rows=769844 width=6)
>> SubPlan
>>   ->  Materialize  (cost=42674.32..42674.32 rows=38 width=50)
>> ->  Seq Scan on tmp_table2  (cost=0.00..42674.32 
>>rows=38 width=50)
>>
>>   EXPLAIN
...
> 
> Earlier pg versions have always been bad at dealing with IN subqueries.
> Try rewriting it as (with fixing any broken syntax, I'm not actually
> testing this :P)
> 
> DELETE FROM tmp_table2 WHERE EXISTS 
>  (SELECT * FROM tmp_table2 t2 WHERE t2.id=tmp_table2.id AND
> t2.name='obsid' AND t2.value='oid080505')

Thanks - that looks *significantly* better:

   lab.devel.configdb=# explain delete from tmp_table2 where exists
   (select 1 from tmp_table2 t2 where
   t2.id=tmp_table2.id and
   t2.name='obsid' and t2.value='oid080505');
   NOTICE:  QUERY PLAN:

   Seq Scan on tmp_table2  (cost=0.00..9297614.80 rows=769844 width=6)
 SubPlan
   ->  Index Scan using inv_index_2 on tmp_table2 t2  (cost=0.00..6.02 
rows=1 width=0)

   EXPLAIN

(This is after putting an index on the (id,name,value) tuple.)  That outer seq 
scan
is still annoying, but maybe this will be fast enough.

I've passed this on, along with the (strong) recommendation that they
upgrade PG.

Thanks!!

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-15 Thread Steve Wampler
Joost Kraaijeveld wrote:
> If I understand correctly (I have 4GB ram):
> 
> [EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100
> 100+0 records in
> 100+0 records out
> 819200 bytes transferred in 304.085269 seconds (26939812 bytes/sec)
> 
> Which looks suspicious: 26308 MB/sec???

Eh?  That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-16 Thread Steve Wampler
Joshua D. Drake wrote:
> The reason you want the dual core cpus is that PostgreSQL can only
> execute 1 query per cpu at a time,...

Is that true?  I knew that PG only used one cpu per query, but how
does PG know how many CPUs there are to limit the number of queries?

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-16 Thread Steve Wampler
David Boreham wrote:
> Steve Wampler wrote:
> 
>> Joshua D. Drake wrote:
>>  
>>
>>> The reason you want the dual core cpus is that PostgreSQL can only
>>> execute 1 query per cpu at a time,...
>>>   
>>
>>
>> Is that true?  I knew that PG only used one cpu per query, but how
>> does PG know how many CPUs there are to limit the number of queries?
>>
>>  
>>
> He means only one query can be executing on each cpu at any particular
> instant.

Got it - the cpu is only acting on one query in any instant but may be
switching between many 'simultaneous' queries.  PG isn't really involved
in the decision.  That makes sense.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Help speeding up delete

2005-11-16 Thread Steve Wampler
Arjen van der Meijden wrote:
> On 15-11-2005 15:18, Steve Wampler wrote:
> 
>> Magnus Hagander wrote:
>> (This is after putting an index on the (id,name,value) tuple.)  That
>> outer seq scan
>> is still annoying, but maybe this will be fast enough.
>>
>> I've passed this on, along with the (strong) recommendation that they
>> upgrade PG.
> 
> 
> Have you tried with an index on (name,value) and of course one on id ?

Yes, although not with a unique index on (name,value) [possible, but not
so on the just-id index].  Anyway, it turns out the latest incarnation
is 'fast enough' for the user's need, so she's not doing any more with
it until after an upgrade.


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [pgsql-www] [PERFORM] Help speeding up delete

2005-11-17 Thread Steve Wampler
Christopher Kings-Lynne wrote:
>> That most people don't know they should use the new one I understand
>> though. But I don't see how this will help against that :-)
> 
> It'll make it easy...

As the miscreant that caused this thread to get started, let me
*wholeheartedly* agree with Chris.  An easy way to get the pg_dump
for the upgrade target to run with the upgradable source
would work wonders.  (Instructions included, of course.)


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Database restore speed

2005-12-01 Thread Steve Oualline
Title: Database restore speed






Our application tries to insert data into the database as fast as it can.

Currently the work is being split into a number of 1MB copy operations.


When we restore the postmaster process tries to use 100% of the CPU.


The questions we have are:


1) What is postmaster doing that it needs so much CPU?


2) How can we get our system to go faster?



Note: We've tried adjusting the checkpoint_segements parameter to no effect.

Any suggestions welcome.





Re: [PERFORM] Disk tests for a new database server

2005-12-08 Thread Steve Poe
Rory,

While I don't have my specific stats with my from my tests with XFS and
bonnie for our company's db server, I do recall vividly that seq. output
did not increase dramatically until I had 8+ discs in a RAID10
configuration on an LSI card. I was not using LVM. If I had less than 8
discs, seq. output was about equal regardless of file system being uses
(EXT3,JFS,or XFS). 

Steve


On Thu, 2005-12-08 at 12:12 +, Rory Campbell-Lange wrote:
> We are testing disk I/O on our new server (referred to in my recent
> questions about LVM and XFS on this list) and have run bonnie++ on the
> xfs partition destined for postgres; results noted below.
> 
> I haven't been able to find many benchmarks showing desirable IO stats.
> As far as I can tell the sequential input (around 11K/sec) looks
> good while the sequential output (around 5K/sec) looks fairly
> average.
> 
> Advice and comments gratefully received.
> Suggested Parameters for running pg_bench would be great!
> 
> Thanks,
> Rory
> 
> The server has a dual core AMD Opteron 270 chip (2000MHz), 6GB of RAM
> and an LSI 320-1 card running 4x147GB disks running in a RAID10
> configuration. The server has a freshly compiled 2.6.14.3 linux kernel.
> 
> partial df output:
> FilesystemSize  Used Avail Use% Mounted on
> ...
> /dev/mapper/masvg-masdata
>99G   33M   94G   1% /masdata
> /dev/mapper/masvg-postgres
>40G   92M   40G   1% /postgres
> 
> partial fstab config:
> ...
> /dev/mapper/masvg-masdata /masdataext3defaults0   
> 2
> /dev/mapper/masvg-postgres /postgres   xfs defaults0  
>  2
> 
> 
> Version  1.03  --Sequential Output-- --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
> %CP
> mas  13000M 52655  99 52951  10 32246   7 41441  72 113438  12 590.0  
>  0
> mas  13000M 49306  83 51967   9 32269   7 42442  70 115427  12 590.5  
>  1
> mas  13000M 53449  89 51982  10 32625   7 42819  71 111829  11 638.3  
>  0
> mas  13000M 51818  88 51639   9 33127   7 42377  70 108585  11 556.5  
>  0
> mas  13000M 48930  90 51750   9 32854   7 41220  71 109813  11 566.2  
>  0
> mas  13000M 52148  88 47393   9 35343   7 42871  70 109775  12 582.0  
>  0
> mas  13000M 52427  88 53040  10 32315   7 42813  71 112045  12 596.7  
>  0
> mas  13000M 51967  87 54004  10 30429   7 46180  76 110973  11 625.1  
>  0
> mas  13000M 48690  89 46416   9 35678   7 41429  72 111612  11 627.2  
>  0
> mas  13000M 52641  88 52807  10 31115   7 43476  72 110694  11 568.2  
>  0
> mas  13000M 52186  88 47385   9 35341   7 42959  72 110963  11 558.7  
>  0
> mas  13000M 52092  87 53111  10 32135   7 42636  69 110560  11 562.0  
>  1
> mas  13000M 49445  90 47378   9 34410   7 41191  72 110736  11 610.3  
>  0
> mas  13000M 51704  88 47699   9 35436   7 42413  69 110446  11 612.0  
>  0
> mas  13000M 52434  88 53331  10 32479   7 43229  71 109385  11 620.6  
>  0
> mas  13000M 52074  89 53291  10 32095   7 43593  72 109541  11 628.0  
>  0
> mas  13000M 48084  88 52624  10 32301   7 40975  72 110548  11 594.0  
>  0
> mas  13000M 53019  90 52441  10 32411   7 42574  68 111321  11 578.0  
>  0
> --Sequential Create-- Random 
> Create
> -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
> files:max:min/sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
> %CP
> mas  16  7970  36 + +++  7534  34  7918  36 + +++  4482  
> 22
> mas  16  7945  33 + +++  7483  30  7918  42 + +++  4438  
> 20
> mas  16  8481  48 + +++  7468  31  7870  39 + +++  4385  
> 23
> mas  16  7915  36 + +++  7498  33  7930  41 + +++  4619  
> 23
> mas  16  8553  35 + +++  8312  38  8613  37 + +++  4513  
> 24
> mas  16  8498  40 + +++  8215  33  8570  43 + +++  4858  
> 26
> mas  16  7892  39 + +++  7624  30  5341  28 + +++  4762  
> 22
> mas  16  5408  27 + +++  9378  37  8573  41 + +++  4385  
> 21
> mas  16  5063  27 + +++  8656  38  5159  27 + +++  4705  
> 24
> mas  16  4917  25 + +++  8682  39  5282  28 + +++  4723  
> 22
> mas  16  5027  28 + +++  8538

Re: [PERFORM] effizient query with jdbc

2005-12-22 Thread Steve Peterson

Is there a reason you can't rewrite your SELECT like:

SELECT UUID FROM MDM.KEYWORDS_INFO WHERE KEYWORDS_ID IN (a, b, c, d)

Even doing them 100 at a time will make a big difference;  you should 
put as many in the list as pgsql supports.  I'm assuming that there's 
an index over KEYWORDS_ID.


Retrieving 1 rows with 1 statements is generally a Bad Idea.

S

At 08:17 AM 12/22/2005, Dave Cramer wrote:

The problem is you are getting the entire list back at once.

You may want to try using a cursor.

Dave
On 15-Dec-05, at 9:44 AM, [EMAIL PROTECTED] wrote:


Hi,
I have a java.util.List of values (1) which i wanted to use for
a query in the where clause of an simple select statement.
iterating over the list and and use an prepared Statement is quite
slow. Is there a more efficient way to execute such a query.

Thanks for any help.
Johannes
.
List ids = new ArrayList();

 List is filled with 1 values ...

List uuids = new ArrayList();
PreparedStatement pstat = db.prepareStatement("SELECT UUID FROM
MDM.KEYWORDS_INFO WHERE KEYWORDS_ID = ?");
for (Iterator iter = ids.iterator(); iter.hasNext();) {
String id = (String) iter.next();
pstat.setString(1, id);
rs = pstat.executeQuery();
if (rs.next()) {
uuids.add(rs.getString(1));
}
rs.close();
}
...






---(end of
broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] improving write performance for logging application

2006-01-03 Thread Steve Eckmann

I have questions about how to improve the write performance of PostgreSQL for 
logging data from a real-time simulation. We found that MySQL 4.1.3 could log 
about 1480 objects/second using MyISAM tables or about 1225 objects/second 
using InnoDB tables, but PostgreSQL 8.0.3 could log only about 540 
objects/second. (test system: quad-Itanium2, 8GB memory, SCSI RAID, GigE 
connection from simulation server, nothing running except system processes and 
database system under test)

We also found that we could improve MySQL performance significantly using MySQL's 
"INSERT" command extension allowing multiple value-list tuples in a single 
command; the rate for MyISAM tables improved to about 2600 objects/second. PostgreSQL 
doesn't support that language extension. Using the COPY command instead of INSERT might 
help, but since rows are being generated on the fly, I don't see how to use COPY without 
running a separate process that reads rows from the application and uses COPY to write to 
the database. The application currently has two processes: the simulation and a data 
collector that reads events from the sim (queued in shared memory) and writes them as 
rows to the database, buffering as needed to avoid lost data during periods of high 
activity. To use COPY I think we would have to split our data collector into two 
processes communicating via a pipe.

Query performance is not an issue: we found that when suitable indexes are 
added PostgreSQL is fast enough on the kinds of queries our users make. The 
crux is writing rows to the database fast enough to keep up with the simulation.

Are there general guidelines for tuning the PostgreSQL server for this kind of 
application? The suggestions I've found include disabling fsync (done), 
increasing the value of wal_buffers, and moving the WAL to a different disk, 
but these aren't likely to produce the 3x improvement that we need. On the 
client side I've found only two suggestions: disable autocommit and use COPY 
instead of INSERT. I think I've effectively disabled autocommit by batching up 
to several hundred INSERT commands in each PQexec() call, and it isn’t clear 
that COPY is worth the effort in our application.

Thanks.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] improving write performance for logging application

2006-01-04 Thread Steve Eckmann




Tom Lane wrote:

  Steve Eckmann <[EMAIL PROTECTED]> writes:
  
  
We also found that we could improve MySQL performance significantly
using MySQL's "INSERT" command extension allowing multiple value-list
tuples in a single command; the rate for MyISAM tables improved to
about 2600 objects/second. PostgreSQL doesn't support that language
extension. Using the COPY command instead of INSERT might help, but
since rows are being generated on the fly, I don't see how to use COPY
without running a separate process that reads rows from the
application and uses COPY to write to the database.

  
  
Can you conveniently alter your application to batch INSERT commands
into transactions?  Ie

	BEGIN;
	INSERT ...;
	... maybe 100 or so inserts ...
	COMMIT;
	BEGIN;
	... lather, rinse, repeat ...

This cuts down the transactional overhead quite a bit.  A downside is
that you lose multiple rows if any INSERT fails, but then the same would
be true of multiple VALUES lists per INSERT.

			regards, tom lane
  

Thanks for the suggestion, Tom. Yes, I think I could do that. But I
thought what I was doing now was effectively the same, because the
PostgreSQL 8.0.0 Documentation says (section 27.3.1): "It is allowed to
include multiple SQL commands (separated by semicolons) in the command
string. Multiple queries sent in a single PQexec call are processed in
a single transaction" Our simulation application has nearly 400
event types, each of which is a C++ class for which we have a
corresponding database table. So every thousand events or so I issue
one PQexec() call for each event type that has unlogged instances,
sending INSERT commands for all instances. For example,

    PQexec(dbConn, "INSERT INTO FlyingObjectState VALUES (...); INSERT
INTO FlyingObjectState VALUES (...); ...");

My thought was that this would be a good compromise between minimizing
transactions (one per event class per buffering interval instead of one
per event) and minimizing disk seeking (since all queries in a single
transaction insert rows into the same table). Am I overlooking
something here? One thing I haven't tried is increasing the buffering
interval from 1000 events to, say, 10,000. It turns out that 1000 is a
good number for Versant, the object database system we're replacing,
and for MySQL, so I assumed it would be a good number for PostgreSQL,
too.

Regards,  Steve





Re: [PERFORM] improving write performance for logging application

2006-01-04 Thread Steve Eckmann




Steinar H. Gunderson wrote:

  On Tue, Jan 03, 2006 at 04:44:28PM -0700, Steve Eckmann wrote:
  
  
Are there general guidelines for tuning the PostgreSQL server for this kind 
of application? The suggestions I've found include disabling fsync (done),

  
  
Are you sure you really want this? The results could be catastrophic in case
of a crash.

  
  
On the client side I've found only two suggestions: disable autocommit and 
use COPY instead of INSERT. I think I've effectively disabled autocommit by 
batching up to several hundred INSERT commands in each PQexec() call, and 
it isn’t clear that COPY is worth the effort in our application.

  
  
I'm a bit confused here: How can you batch multiple INSERTs into large
statements for MySQL, but not batch multiple INSERTs into COPY statements for
PostgreSQL?

Anyhow, putting it all inside one transaction (or a few) is likely to help
quite a lot, but of course less when you have fsync=false. Bunding multiple
statements in each PQexec() call won't really give you that; you'll have to
tell the database so explicitly.

/* Steinar */
  

Thanks, Steinar. I don't think we would really run with fsync off, but
I need to document the performance tradeoffs. You're right that my
explanation was confusing; probably because I'm confused about how to
use COPY! I could batch multiple INSERTS using COPY statements, I just
don't see how to do it without adding another process to read from
STDIN, since the application that is currently the database client is
constructing rows on the fly. I would need to get those rows into some
process's STDIN stream or into a server-side file before COPY could be
used, right?

You're comment about bundling multiple statements in each PQexec() call
seems to disagree with a statement in 27.3.1 that I interpret as saying
each PQexec() call corresponds to a single transaction. Are you sure my
interpretation is wrong?

Regards, Steve




Re: [PERFORM] improving write performance for logging application

2006-01-04 Thread Steve Eckmann




dlang wrote:

  
On Tue, 3 Jan 2006, Tom Lane wrote:

  
  
Steve Eckmann <[EMAIL PROTECTED]> writes:


  We also found that we could improve MySQL performance significantly
using MySQL's "INSERT" command extension allowing multiple value-list
tuples in a single command; the rate for MyISAM tables improved to
about 2600 objects/second. PostgreSQL doesn't support that language
extension. Using the COPY command instead of INSERT might help, but
since rows are being generated on the fly, I don't see how to use COPY
without running a separate process that reads rows from the
application and uses COPY to write to the database.
  

Can you conveniently alter your application to batch INSERT commands
into transactions?  Ie

	BEGIN;
	INSERT ...;
	... maybe 100 or so inserts ...
	COMMIT;
	BEGIN;
	... lather, rinse, repeat ...

This cuts down the transactional overhead quite a bit.  A downside is
that you lose multiple rows if any INSERT fails, but then the same would
be true of multiple VALUES lists per INSERT.

  
  
Steve, you mentioned that you data collector buffers the data before
sending it to the database, modify it so that each time it goes to send
things to the database you send all the data that's in the buffer as a
single transaction.

I am working on useing postgres to deal with log data and wrote a simple
perl script that read in the log files a line at a time, and then wrote
them 1000 at a time to the database. On a dual Opteron 240 box with 2G of
ram 1x 15krpm SCSI drive (and a untuned postgress install with the compile
time defaults) I was getting 5000-8000 lines/sec (I think this was with
fsync disabled, but I don't remember for sure). and postgres was
complaining that it was overrunning it's log sizes (which limits the speed
as it then has to pause to flush the logs)

the key thing is to send multiple lines with one transaction as tom shows
above.

David Lang

Thanks, David. I will look more carefully at how to batch multiple rows
per PQexec() call.  Regards, Steve.




Re: [PERFORM] improving write performance for logging application

2006-01-04 Thread Steve Eckmann

Ian Westmacott wrote:


We have a similar application thats doing upwards of 2B inserts
per day.  We have spent a lot of time optimizing this, and found the
following to be most beneficial:

1)  use COPY (BINARY if possible)
2)  don't use triggers or foreign keys
3)  put WAL and tables on different spindles (channels if possible)
4)  put as much as you can in each COPY, and put as many COPYs as
   you can in a single transaction.
5)  watch out for XID wraparound
6)  tune checkpoint* and bgwriter* parameters for your I/O system


Thanks, Ian. I will look at how to implement your suggestions.

Regards, Steve

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] improving write performance for logging application

2006-01-04 Thread Steve Eckmann




Tom Lane wrote:

  Steve Eckmann <[EMAIL PROTECTED]> writes:
  
  <>Thanks for the suggestion, Tom. Yes, I
think I could do that. But I 
thought what I was doing now was effectively the same, because the 
PostgreSQL 8.0.0 Documentation says (section 27.3.1): "It is allowed to

include multiple SQL commands (separated by semicolons) in the command 
string. Multiple queries sent in a single PQexec call are processed in
a 
single transaction" Our simulation application has nearly 400 event

types, each of which is a C++ class for which we have a corresponding 
database table. So every thousand events or so I issue one PQexec()
call 
for each event type that has unlogged instances, sending INSERT
commands 
for all instances. For example,

  
PQexec(dbConn, "INSERT INTO FlyingObjectState VALUES (...); INSERT 
INTO FlyingObjectState VALUES (...); ...");

  
  
Hmm.  I'm not sure if that's a good idea or not.  You're causing the
server to take 1000 times the normal amount of memory to hold the
command parsetrees, and if there are any O(N^2) behaviors in parsing
you could be getting hurt badly by that.  (I'd like to think there are
not, but would definitely not swear to it.)  OTOH you're reducing the
number of network round trips which is a good thing.  Have you actually
measured to see what effect this approach has?  It might be worth
building a test server with profiling enabled to see if the use of such
long command strings creates any hot spots in the profile.

			regards, tom lane
  

No, I haven't measured it. I will compare this approach with others
that have been suggested. Thanks.  -steve




Re: [PERFORM] improving write performance for logging application

2006-01-04 Thread Steve Eckmann




Kelly Burkhart wrote:
On 1/4/06, Steve Eckmann
<[EMAIL PROTECTED]>
wrote:
  
  Thanks,
Steinar. I don't think we would really run with fsync off, but
I need to document the performance tradeoffs. You're right that my
explanation was confusing; probably because I'm confused about how to
use COPY! I could batch multiple INSERTS using COPY statements, I just
don't see how to do it without adding another process to read from
STDIN, since the application that is currently the database client is
constructing rows on the fly. I would need to get those rows into some
process's STDIN stream or into a server-side file before COPY could be
used, right?
  
Steve,
  
You can use copy without resorting to another process.  See the
libpq documentation for 'Functions Associated with the copy
Command".  We do something like this:
  
  
  
char *mbuf;
  
// allocate space and fill mbuf with appropriately formatted data
somehow
  
PQexec( conn, "begin" );
PQexec( conn, "copy mytable from stdin" );
PQputCopyData( conn, mbuf, strlen(mbuf) );
PQputCopyEnd( conn, NULL );
PQexec( conn, "commit" );
  
-K

Thanks for the concrete example, Kelly. I had read the relevant libpq
doc but didn't put the pieces together.

Regards,  Steve




Re: [PERFORM] Bad iostat numbers

2006-12-06 Thread Steve Atkins


On Dec 5, 2006, at 8:54 PM, Greg Smith wrote:


On Tue, 5 Dec 2006, Craig A. James wrote:

I'm not familiar with the inner details of software RAID, but the  
only circumstance I can see where things would get corrupted is if  
the RAID driver writes a LOT of blocks to one disk of the array  
before synchronizing the others...


You're talking about whether the discs in the RAID are kept  
consistant. While it's helpful with that, too, that's not the main  
reason a the battery-backed cache is so helpful.  When PostgreSQL  
writes to the WAL, it waits until that data has really been placed  
on the drive before it enters that update into the database.  In a  
normal situation, that means that you have to pause until the disk  
has physically written the blocks out, and that puts a fairly low  
upper limit on write performance that's based on how fast your  
drives rotate.  RAID 0, RAID 1, none of that will speed up the time  
it takes to complete a single synchronized WAL write.


When your controller has a battery-backed cache, it can immediately  
tell Postgres that the WAL write completed succesfully, while  
actually putting it on the disk later.  On my systems, this results  
in simple writes going 2-4X as fast as they do without a cache.   
Should there be a PC failure, as long as power is restored before  
the battery runs out that transaction will be preserved.


What Alex is rightly pointing out is that a software RAID approach  
doesn't have this feature.  In fact, in this area performance can  
be even worse under SW RAID than what you get from a single disk,  
because you may have to wait for multiple discs to spin to the  
correct position and write data out before you can consider the  
transaction complete.


So... the ideal might be a RAID1 controller with BBU for the WAL and  
something else, such as software RAID, for the main data array?


Cheers,
  Steve


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Dispatch-Merge pattern

2007-03-15 Thread Steve Atkins


On Mar 13, 2007, at 6:36 AM, James Riordan wrote:


Howdy-

I am currently using PostgreSQL to store and process a high-bandwidth
event stream.  I do not need old events but the delete and vacuum does
not terminate due to the large number of events being inserted (it
just pushes me over the tipping point of where the machine can keep up
with the events).

I ended up implementing a scheme where a trigger is used to redirect
the events (round robin based on time) to a series of identically
structured tables.  I can then use TRUNCATE older tables rather than
DELETE and VACUUM (which is a significant speed up).

It worked out pretty well so thought post the idea to find out if

- it is stupid way of doing things and there is a correct database
  abstraction for doing this

or

 - it is a reasonable way of solving this problem and might be of use
   to other folks using rdbs as event processing

I then use a view to merge the tables.  Obviously update would be a
problem for my purposes, and I suppose a lot of event processing, it
isn't an issue.

Either way, details are at:
 http://unsyntax.net/james/blog/tools+and+programming/2007/03/08/ 
Dispatch-Merge-Database-Pattern


I can't reach that URL, but from what you say it sounds like you've
re-invented table partitioning.

http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html

If you do it the postgresql way you can also take advantage of
constraint exclusion, to speed up some selects on the set of
partitioned tables, and inheritance means you don't have to
maintain the union view yourself.

Cheers,
  Steve


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Steve Atkins


On Mar 22, 2007, at 10:21 AM, Craig A. James wrote:


Tino Wildenhain wrote:

Craig A. James schrieb:
...
In our case (for a variety of reasons, but this one is critical),  
we actually can't use Postgres indexing at all -- we wrote an  
entirely separate indexing system for our data...


...There is no need to store or
maintain this information along with postgres when you can store
and maintain it directly in postgres as well.


Whether we store our data inside or outside Postgres misses the  
point (in fact, most of our data is stored IN Postgres).  It's the  
code that actually performs the index operation that has to be  
external to Postgres.



On top of that, postgres has a very flexible and extensible index
system.


You guys can correct me if I'm wrong, but the key feature that's  
missing from Postgres's flexible indexing is the ability to  
maintain state across queries.  Something like this:


 select a, b, my_index_state() from foo where ...
   offset 100 limit 10 using my_index(prev_my_index_state);

The my_index_state() function would issue something like a  
"cookie", an opaque text or binary object that would record  
information about how it got from row 1 through row 99.  When you  
issue the query above, it could start looking for row 100 WITHOUT  
reexamining rows 1-99.


This could be tricky in a OLTP environment, where the "cookie"  
could be invalidated by changes to the database.  But in warehouse  
read-mostly or read-only environments, it could yield vastly  
improved performance for database web applications.


If I'm not mistaken, Postgres (nor Oracle, MySQL or other RDBMS)  
can't do this.  I would love to be corrected.


As long as you're ordering by some row in the table then you can do  
that in

straight SQL.

select a, b, ts from foo where (stuff) and foo > X order by foo limit 10

Then, record the last value of foo you read, and plug it in as X the  
next

time around.

This has the advantage over a simple offset approach of actually
displaying all the data as a user pages through it too. (Consider
the case where the user is viewing offsets 91-100, and you delete
the record at offset 15. The user goes to the next page and will
miss the record that used to be at offset 101 and is now at offset
100).

The problem is that relational databases were invented before the  
web and its stateless applications.  In the "good old days", you  
could connect to a database and work for hours, and in that  
environment cursors and such work well -- the RDBMS maintains the  
internal state of the indexing system.  But in a web environment,  
state information is very difficult to maintain.  There are all  
sorts of systems that try (Enterprise Java Beans, for example), but  
they're very complex.


I think the problem is more that most web developers aren't very good
at using the database, and tend to fall back on simplistic, wrong,  
approaches

to displaying the data. There's a lot of monkey-see, monkey-do in web
UI design too, which doesn't help.

Cheers,
  Steve


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Steve Atkins


On Mar 22, 2007, at 11:26 AM, Guido Neitzer wrote:


On 22.03.2007, at 11:53, Steve Atkins wrote:

As long as you're ordering by some row in the table then you can  
do that in

straight SQL.

select a, b, ts from foo where (stuff) and foo > X order by foo  
limit 10


Then, record the last value of foo you read, and plug it in as X  
the next

time around.


This does only work if you have unique values in foo. You might  
have "batch breaks" inside   a list of rows with equal values for foo.


If I don't have unique values in foo, I certainly have unique values  
in (foo, pk).




But: a web application that needs state and doesn't maintain it by  
itself (or inside the dev toolkit) is imho broken by design. How  
should the database store a "state" for a web app? It's only  
possible on the web app part, because the app is either stateless  
and so are the queries to the database - they have to be re- 
evaluated for every request as the request might come from totally  
different sources (users, ...) or it is stateful and has to  
maintain the state because only the app developer knows, what  
information is needed for the "current state".


This is why all web application toolkits have a "session" concept.


Yes. HTTP is not very stateful. Web applications are stateful. There  
are some really obvious approaches to maintaining state cleanly that  
work well with databases and let you do some quite complex stuff  
(tying a persistent database connection to a single user, for  
instance). But they don't scale at all well.


What Craig was suggesting is, basically, to assign a persistent  
database connection to each user. But rather than maintain that  
connection as a running process, to serialise all the state out of  
the database connection and store that in the webapp, then when the  
next action from that user comes in take a database connection and  
stuff all that state into it again.


It's a lovely idea, but strikes me as completely infeasible in the  
general case. There's just too much state there. Doing it in the  
specific case is certainly possible, but rapidly devolves to the  
standard approach of "On the first page of results, run the query and  
record the first 5000 results. Store those in a scratch table,  
indexed by session-id, or in external storage. On displaying later  
pages of results to the same user, pull directly from the already  
calculated results."





I think the problem is more that most web developers aren't very good
at using the database, and tend to fall back on simplistic, wrong,  
approaches

to displaying the data. There's a lot of monkey-see, monkey-do in web
UI design too, which doesn't help.


Sure. That is the other problem ... ;-) But, and I think this is  
much more important: most toolkits today free you from using the  
database directly and writing lots and lots of lines of sql code  
which instantly breaks when you switch the storage backend. It's  
just the thing from where you look at something.


The real problem is the user-interface is designed around what is  
easy to implement in elderly cgi scripts, rather than what's  
appropriate to the data being displayed or useful to the user.  
Displaying tables of results, ten at a time, is just one of the more  
egregious examples of that.


Cheers,
  Steve


---(end of broadcast)---
TIP 6: explain analyze is your friend


  1   2   3   4   >