date:20031003

[PERFORM] A Basic Question

2003-10-03 Thread shyamperi

12:28p
Dear All,
This question is regarding the performance of queries in general.
The performance of the queries wud varying depending on the no. Of tuples it is
returning, and the sort of alogorithm that will be implemented or the retrieval.
Now if the relation returns zero tuples.. (the seq, and the index scan is the best
option) and if there are 1 or more then rest PG-supported scans will be the best.
Now here is where I am having a bit of considerations. My relation works fast, when it
returns more than on tuple. But get's slow when it returns zero tuple.
Now how shud I got abt it.

-
Warm Regards
Shÿam Peri

II Floor, Punja Building,
M.G.Road,
Ballalbagh,
Mangalore-575003
Ph : 91-824-2451001/5
Fax : 91-824-2451050

DISCLAIMER: This message contains privileged and confidential information and is
intended only for the individual named.If you are not the intended
recipient you should not disseminate,distribute,store,print, copy or
deliver this message.Please notify the sender immediately by e-mail if
you have received this e-mail by mistake and delete this e-mail from
your system.12:28pDear All,
This question is regarding the performance of queries in general.
The performance of the queries wud varying depending on the no. Of tuples it is returning, and the sort of alogorithm that will be implemented or the retrieval. Now if the relation returns zero tuples.. (the seq, and the index scan is the best option) and if there are 1 or more then rest PG-supported scans will be the best.
Now here is where I am having a bit of considerations. My relation works fast, when it returns more than on tuple. But get's slow when it returns zero tuple.
Now how shud I got abt it.-
Warm Regards
Shÿam Peri

II Floor, Punja Building,
M.G.Road,
Ballalbagh,
Mangalore-575003
Ph : 91-824-2451001/5
Fax : 91-824-2451050

DISCLAIMER: This message contains privileged and confidential information and is
intended only for the individual named.If you are not the intended
recipient you should not disseminate,distribute,store,print, copy or
deliver this message.Please notify the sender immediately by e-mail if
you have received this e-mail by mistake and delete this e-mail from
your system.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] runtime of the same query in function differs on 2

2003-10-03 Thread Andriy Tkachuk

No: the function is calc_total(int,int,int) and the table have the
same types.

As Tom said that my problem is because of planning in pl/pgsql.  As
is written in
http://www.postgresql.org/docs/7.3/static/plpgsql.html#PLPGSQL-OVERVIEW
plans for queries in pl/pgsql are made just once - when they are
first used in function by backend. So AFAICS this planning do not
take into consideration table statistics because it don't see values
of variables in queries (or if see than it must not take them into account,
because they may be changed in future function callings).

I rollback to my previous realization of calc_total() on pl/tcl. I
use there spi_exec - so the query always regards as dynamic - it
always parsed, rewritten, planned but executes fastest much more
:)

On Fri, 3 Oct 2003, Gaetano Mendola wrote:

> Andriy Tkachuk wrote:
>
> > Hi folks.
> >
> > What's wrong with planner that executes my query in function?:
> > (i mean no explanation but runtime)
> >
> >
> > tele=# EXPLAIN analyze select calc_total(6916799, 1062363600, 1064955599);
> > QUERY PLAN
> > --
> >  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=36919.37..36919.37 rows=1 
> > loops=1)
> >  Total runtime: 36919.40 msec
> >  
> >
> > tele=# \df+ calc_total
> > ...
> > declare
> > usr alias for $1;
> > d1 alias for $2;
> > d2   alias for $3;
> > res integer;
> > begin
> > select sum(cost) into res
> > from bills where
> > (parent(user_id) = usr or user_id = usr)
> > and dat >= d1 and dat < d2;
> > if res is not null then
> > return res;
> > else
> > return 0;
> > end if;
> > end;
>
> You didn't wrote the type of d1 and d2, I had your same problem:
>
> declare
> a_user alias for $1;
> res INTEGER;
> begin
>   select cost into res
>   from my_table
>   where login = a_user;
>
>   ..
> end;
>
> the problem was that login was a VARCHAR and a_user was a TEXT so
> the index was not used, was enough cast a_user::varchar;
>
>
> I believe that your dat, d1, d2 are not "index" comparable.
>
>
> Gaetano
>
>
>
>
>
>
>
>
>
>
>
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] A Basic Question

2003-10-03 Thread Richard Huxton

On Friday 03 October 2003 07:34, [EMAIL PROTECTED] wrote:
> 12:28p
> Dear All,
> This question is regarding the performance of queries in general.
> The performance of the queries wud varying depending on the no. Of tuples
> it is returning, and the sort of alogorithm that will be implemented or the
> retrieval. Now if the relation returns zero tuples.. (the seq, and the
> index scan is the best option) and if there are 1 or more then rest
> PG-supported scans will be the best. Now here is where I am having a bit of
> considerations. My relation works fast, when it returns more than on tuple.
> But get's slow when it returns zero tuple. Now how shud I got abt it.

If PG has to examine a lot of tuples to rule them out, then returning no rows 
can take longer.

If you post EXPLAIN ANALYSE output for both queries, someone will be able to 
explain why in your case.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] count(*) slow on large tables

2003-10-03 Thread Christopher Browne

Oops! [EMAIL PROTECTED] (Dror Matalon) was seen spray-painting on a wall:
> I smell a religious war in the aii:-). 
> Can you go several days in a row without doing select count(*) on any
> of your tables? 

I would be more likely, personally, to run "VACUUM VERBOSE ANALYZE",
which has useful side-effects :-).

> I suspect that this is somewhat a domain specific issue. In some
> areas you don't need to know the total number of rows in your
> tables, in others you do.

"Relationship tables," which don't contain data in their own right,
but which, instead, link together records in other tables, are likely
to have particularly useless COUNT(*)'s.

> I also suspect that you're right, that end user applications don't
> use this information as often as DBAs would. On the other hand, it
> seems whenever you want to optimize your app (something relevant to
> this list), one of the things you do need to know is the number of
> rows in your table.

Ah, but in the case of optimization, there's little need for
"transactionally safe, MVCC-managed, known-to-be-exact" values.
Approximations are plenty good enough to get the right plan.

Furthermore, it's not the number of rows that is most important when
optimizing queries; the number of pages are more relevant to the
matter, as that's what the database is slinging around.
-- 
(reverse (concatenate 'string "ac.notelrac.teneerf" "@" "454aa"))
http://www3.sympatico.ca/cbbrowne/multiplexor.html
Rules of  the Evil Overlord #134. "If  I am escaping in  a large truck
and the hero is pursuing me in  a small Italian sports car, I will not
wait for the hero to pull up along side of me and try to force him off
the road  as he attempts to climb  aboard. Instead I will  slam on the
brakes  when he's  directly behind  me.  (A  rudimentary  knowledge of
physics can prove quite useful.)" 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] count(*) slow on large tables

2003-10-03 Thread Jeff

On Thu, 2 Oct 2003, Christopher Browne wrote:

> I can't imagine why the raw number of tuples in a relation would be
> expected to necessarily be terribly useful.
>

We use stuff like that for reporting queries.

example:
On our message boards each post is a row.  The powers that be like to know
how many posts there are total (In addition to 'today')-
select count(*) from posts is how it has been
done on our informix db.  With our port to PG I instead select reltuples
pg_class.

I know when I login to a new db (or unknown to me db) the first thing I do
is look at tables and see what sort of data there is.. but in code I'd
rarely do that.

I know some monitoring things around here also do a select count(*) on
sometable to ensure it is growing, but like you said, this is easily done
with the number of pages as well.

yes. Informix caches this data. I believe Oracle does too.

Mysql with InnoDB does the same thing PG does. (MyISAM caches it)

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[PERFORM] Postgres low end processing.

2003-10-03 Thread Stef

Hi everyone,

I've been trying to find out if some guidelines
exist, somewhere, describing how postgres
can possibly run on less than 8MB of RAM.
(Disk space not an issue).

The closest thread I could find in the list 
archives is :
http://archives.postgresql.org/pgsql-general/2002-06/msg01343.php

Is it possible to have a stripped-down version of 
postgres that will use an absolute minimal amount
of memory? 

Maybe by switching off some features/options
at compile time, and/or configuration tweaks?
(Or anything else)

This will be on very low end i386 architecture.
Performance penalties are expected and
will be accepted. I will need the
functionality of >= 7.3.4 , at least.

Any help will be much appreciated.

Regards
Stef




0009.mimetmp
Description: PGP signature


pgp0.pgp
Description: PGP signature

[PERFORM] low cardinality column

2003-10-03 Thread rwu . cbnco . com

Hi,

I have a select like this:

SELECT MAX(transactionid) FROM cbntransaction WHERE transactiontypeid=0;

in the query:
transactionid is the primary key of cbntransaction table,
But transactiontypeid is a low cardinality column, there're over 100,000
records has the same trnsactiontypeid.
I was trying to create an index on (transactiontypeid, transactionid), but
no luck on that, postgresql will still scan the table.
I'm wondering if there's solution for this query:
Maybe something like if I can partition the table using transactiontypeid,
and do a local index on transactionid on each partition, but I couldnt'
find any doc on postgresql to do that.

Thanks in advance,
rong :-)



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] inferior SCSI performance

2003-10-03 Thread Vivek Khera

> "CB" == Christopher Browne <[EMAIL PROTECTED]> writes:

CB> Unfortunately, while there are companies hawking SSDs, they are in the
CB> "you'll have to talk to our salescritter for pricing" category, which
CB> means that they must be ferociously expensive.  :-(.

You ain't kidding.  Unfortunately, one of the major vendors just went
belly up (Imperial Technology) so pricing probably won't get any
better anytime soon.

Perhaps one of these days I'll try that experiment on my SSD... ;-)


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Is This My Speed Limit?

2003-10-03 Thread scott.marlowe

On Thu, 2 Oct 2003, CN wrote:

> Hi!
> 
> It's just my curiosity. I wonder if there is any way to break my speed
> limit on AMD 450Mhz:

You're most likely I/O bound, not CPU bound here.  So, if you want better 
speed, you'll likely need a better storage subsystem.

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] count(*) slow on large tables

2003-10-03 Thread Jean-Luc Lachance

Well I can think of many more case where it would be usefull:

SELECT COUNT(DISTINCT x) FROM ...
SELECT COUNT(*) FROM ... WHERE x = ?


Also having transaction number (visibility) would tip the balance more
toward index_scan than seq_scan because you do not have to look up
visibility in the data file. We all know this has been an issue many
times.
Having a different index file structure when the index is not UNIQUE
would help too.
The last page of a non unique index could hold more stats.



Christopher Browne wrote:
> 
> [EMAIL PROTECTED] (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
> 
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
> 
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.
> 
> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
> 
> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.
> 
> And it still begs the same question, of why the result of this query
> would be particularly meaningful to anyone.  I don't see the
> usefulness; I don't see the value of going to the considerable effort
> of "fixing" this purported problem.
> --
> let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
> 
> Christopher Browne
> (416) 646 3304 x124 (land)
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Postgres low end processing.

2003-10-03 Thread Tom Lane

Stef <[EMAIL PROTECTED]> writes:
> I've been trying to find out if some guidelines
> exist, somewhere, describing how postgres
> can possibly run on less than 8MB of RAM.

Are you sure you want Postgres, and not something smaller?  BDB,
or SQL Lite, for example?

"Postgres is bloatware by design: it was built to house PhD theses."
-- J. Hellerstein (who ought to know)

But having said that ... given virtual memory and cramped configuration
settings, Postgres would certainly run in an 8M machine.  Maybe "crawl"
would be a more applicable verb than "run", but you could execute it.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Postgres low end processing.

2003-10-03 Thread Stef

On Fri, 03 Oct 2003 11:42:54 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:

=> Are you sure you want Postgres, and not something smaller?  BDB,
=> or SQL Lite, for example?
I have considered various options, including BDB and SQL Lite, but
alas, it will have to be postgres if it's going to be a database. Otherwise
it will be back to the original idea of flat .idx files :(
 
=> "Postgres is bloatware by design: it was built to house PhD theses."
=> -- J. Hellerstein (who ought to know)
 :o)  Believe me, I've been amazed since I encountered postgres v6.3.2
in '98

=> But having said that ... given virtual memory and cramped configuration
=> settings, Postgres would certainly run in an 8M machine.  Maybe "crawl"
=> would be a more applicable verb than "run", but you could execute it.

Crawling is ok. Won't differ much from normal operation on a machine like that.
Any  tips on how to achieve the most diminutive vmem an conf settings?
I tried to figure this out from the docs, and played around with 
backend/storage , but I'm not really winning.

Regards
Stef


pgp0.pgp
Description: PGP signature

Re: [PERFORM] Postgres low end processing.

2003-10-03 Thread Tom Lane

Stef <[EMAIL PROTECTED]> writes:
> Crawling is ok. Won't differ much from normal operation on a machine
> like that.  Any tips on how to achieve the most diminutive vmem an
> conf settings?

The out-of-the-box settings are already pretty diminutive on current
releases :-(.  In 7.4 you'd likely want to knock back shared_buffers
and max_connections, and maybe the fsm settings if the database is going
to be tiny.

> I tried to figure this out from the docs, and played
> around with backend/storage , but I'm not really winning.

What exactly is failing?  And what's the platform, anyway?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] count(*) slow on large tables

2003-10-03 Thread Hilary Forbes

We frequently need to know the number of tuples in a table although sometimes we do 
have WHERE status='X' for example but this often doesn't guarantee an indexed scan. 
And yes, my reasons are the same - reporting figures eg number of bookings made since 
the system was introduced.   Have you tried doing

SELECT count(pkey)

rather than count(*)

where pkey is the primary key (assuming you have a single field that is a primary key 
or a unique indexed key).  This is MUCH faster in my experience.  If you don't have 
such an animal, I'd seriously suggesting putting in a serial number and recreate the 
table with that as the primary key.

The vacuuming bit is not accurate enough for us in many instances.  Also a count can 
be easily fed into other programs/web pages etc without having to parse the vacuum 
output.

Hilary

At 23:22 02/10/2003 -0700, you wrote:

>I can tell you that this is one of the first thing applications' programmers and IT 
>managers notice. It can slightly tarnish postgres' image when it takes it many long 
>seconds to do what other databases can do in a snap. The "whys and wherefores" can be 
>hard to get across once they see the comparative numbers.
>
>When I use Informix "dbaccess" it has a "status" which will tell me the row count of 
>a table virtually instantly -- it can be locked out by a user with an exclusive lock 
>so its not entirely independant of the table (like a stored value in one of the 
>system catalog tables).
>
>This is not to say Informix is "right" and Postgres is "wrong" ... but it  is 
>something that virtually any newcomer will run into head long, with resulting bruises 
>and contusions, not to mention confusion.
>
>At the very least this needs to be VERY clearly explained right up front, along with 
>some of the possible work-arounds, depending on what one is really after with this 
>info.
>
>Greg Williamson
>DBA
>GlobeXplorer LLC
>
>-Original Message-
>From:   Dror Matalon [mailto:[EMAIL PROTECTED]
>Sent:   Thu 10/2/2003 9:27 PM
>To: [EMAIL PROTECTED]
>Cc: 
>Subject:Re: [PERFORM] count(*) slow on large tables
>
>
>I smell a religious war in the aii:-). 
>Can you go several days in a row without doing select count(*) on any
>of your tables? 
>
>I suspect that this is somewhat a domain specific issue. In some areas
>you don't need to know the total number of rows in your tables, in
>others you do. 
>
>I also suspect that you're right, that end user applications don't use
>this information as often as DBAs would. On the other hand, it seems
>whenever you want to optimize your app (something relevant to this list),
>one of the things you do need to know is the number of rows in your
>table.
>
>Dror
>
>On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote:
>> The world rejoiced as [EMAIL PROTECTED] (Dror Matalon) wrote:
>> > I don't have an opinion on how hard it would be to implement the
>> > tracking in the indexes, but "select count(*) from some table" is, in my
>> > experience, a query that people tend to run quite often. 
>> > One of the databases that I've used, I believe it was Informix, had that
>> > info cached so that it always new how many rows there were in any
>> > table. It was quite useful.
>> 
>> I can't imagine why the raw number of tuples in a relation would be
>> expected to necessarily be terribly useful.
>> 
>> I'm involved with managing Internet domains, and it's only when people
>> are being pretty clueless that anyone imagines that "select count(*)
>> from domains;" would be of any use to anyone.  There are enough "test
>> domains" and "inactive domains" and other such things that the raw
>> number of "things in the table" aren't really of much use.
>> 
>> - I _do_ care how many pages a table occupies, to some extent, as that
>> determines whether it will fit in my disk space or not, but that's not
>> COUNT(*).
>> 
>> - I might care about auditing the exact numbers of records in order to
>> be assured that a data conversion process was done correctly.  But in
>> that case, I want to do something a whole *lot* more detailed than
>> mere COUNT(*).
>> 
>> I'm playing "devil's advocate" here, to some extent.  But
>> realistically, there is good reason to be skeptical of the merits of
>> using SELECT COUNT(*) FROM TABLE for much of anything.
>> 
>> Furthermore, the relation that you query mightn't be a physical
>> "table."  It might be a more virtual VIEW, and if that's the case,
>> bets are even MORE off.  If you go with the common dictum of "good
>> design" that users don't directly access tables, but go through VIEWs,
>> users may have no way to get at SELECT COUNT(*) FROM TABLE.
>> -- 
>> output = reverse("ac.notelrac.teneerf" "@" "454aa")
>> http://www.ntlug.org/~cbbrowne/finances.html
>> Rules  of  the  Evil  Overlord  #74.   "When  I  create  a  multimedia
>> presentation of my plan designed  so that my five-year-old advisor can
>> easily  understand the  details, I  will not  label the  disk "Project
>

Re: [PERFORM] TPC-R benchmarks

2003-10-03 Thread Oleg Lebedev

Josh,

I declared all the indexes that you suggested and ran vacuum full
analyze. The query plan has not changed and it's still trying to use
seqscan. I tried to disable seqscan, but the plan didn't change. Any
other suggestions?
I started explain analyze on the query, but I doubt it will finish any
time soon.
Thanks.

Oleg


-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 02, 2003 11:27 PM
To: Oleg Lebedev; scott.marlowe
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] TPC-R benchmarks


Oleg,

> I have another question. How do I optimize my indexes for the query 
> that contains a lot of ORed blocks, each of which contains a bunch of 
> ANDed expressions? The structure of each ORed block is the same except

> the right-hand-side values vary.

Given the example, I'd do a multicolumn index on p_brand, p_container,
p_size 
and a second multicolumn index on l_partkey, l_quantity, l_shipmode.
Hmmm 
... or maybe seperate indexes, one on l_partkey and one on l_quantity, 
l_shipmode & l_instruct.   Test both configurations.

Mind you, if this is also an OLTP table, then you'd want to test those 
multi-column indexes to determine the least columns you need for the
indexes 
still to be used, since more columns = more index maintainence.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] TPC-R benchmarks

2003-10-03 Thread Josh Berkus

Oleg,

> I declared all the indexes that you suggested and ran vacuum full
> analyze. The query plan has not changed and it's still trying to use
> seqscan. I tried to disable seqscan, but the plan didn't change. Any
> other suggestions?
> I started explain analyze on the query, but I doubt it will finish any
> time soon.

Can I get a copy of the database so that I can tinker?   I'm curious now, plus 
I want our benchmarks to look good.

I have a private FTP if that helps.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Postgres low end processing.

2003-10-03 Thread Stef

On Fri, 03 Oct 2003 12:32:00 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:

=> What exactly is failing?  And what's the platform, anyway?

Nothing is really failing atm, except the funds for better 
hardware. JBOSS and some other servers need to be 
run on these machines, along with linux, which will be 
a minimal RH >= 7.2 with kernel 2.4.21
(Any better suggestions here?)

In this case, whatever is the least amount of memory
postgres can run on, is what is needed. So this is still
a kind of feasibility study. Of course, it will still be thoroughly
tested, if it turns out to be possible. (Which I know it is, but not how)

Regards
Stef


pgp0.pgp
Description: PGP signature

Re: [PERFORM] TPC-R benchmarks

2003-10-03 Thread Oleg Lebedev

Josh,
My data directory is 3.8 GB.
I can send you flat data files and scripts to create indices, but still
it would be about 1.3 GB of data. Do you still want me to transfer data
to you? If yes, then just give me your FTP address.
Thanks.

Oleg

-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 03, 2003 11:22 AM
To: Oleg Lebedev; scott.marlowe
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] TPC-R benchmarks


Oleg,

> I declared all the indexes that you suggested and ran vacuum full 
> analyze. The query plan has not changed and it's still trying to use 
> seqscan. I tried to disable seqscan, but the plan didn't change. Any 
> other suggestions? I started explain analyze on the query, but I doubt

> it will finish any time soon.

Can I get a copy of the database so that I can tinker?   I'm curious
now, plus 
I want our benchmarks to look good.

I have a private FTP if that helps.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Postgres low end processing.

2003-10-03 Thread Josh Berkus

Stef,

> I've been trying to find out if some guidelines
> exist, somewhere, describing how postgres
> can possibly run on less than 8MB of RAM.
> (Disk space not an issue).

I can tell you from experience that you will get some odd behaviour, and even 
connection failures, when Postgres is forced into swap by lack of memory.  
Also, you will run into trouble with the default Linux kernel 2.4 VM manager, 
which allows applications to overcommit memory; either hack your own memory 
manager, or designate swap space >= 200% of RAM.

Also, program your application to expect, and recover from, PostgreSQL 
failures and connection failures.

> Is it possible to have a stripped-down version of
> postgres that will use an absolute minimal amount
> of memory?

I don;t know that there is anything you can remove safely from the postgres 
core that would save you memory.

> Maybe by switching off some features/options
> at compile time, and/or configuration tweaks?
> (Or anything else)

You're in luck; the default postgresql.conf file for 7.3 is actually cofigured 
for a low-memory, slow machine setup (which most other people bitch about).
Here's a few other things you can do:

1. Make sure that the WAL files (pg_xlog) are on a seperate disk from the 
database files, either through mounting or symlinking.

2. Tweak the .conf file for low vacuum_mem (1024?), but vacuum very 
frequently, like every 1-5 minutes.  Spend some time tuning your 
fsm_max_pages to the ideal level so that you're not allocating any extra 
memory to the FSM.

3. If your concern is *average* CPU/RAM consumption, and not peak load 
activity, increase wal_files and checkpoint_segments to do more efficient 
batch processing of pending updates as the cost of some disk space.  If peak 
load activity is a problem, don't do this.

4. Tune all of your queries carefully to avoid anything requiring a 
RAM-intensive merge join or CPU-eating calculated expression hash join, or 
similar computation-or-RAM-intensive operations.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Postgres low end processing.

2003-10-03 Thread Ron Johnson

On Fri, 2003-10-03 at 12:52, Stef wrote:
> On Fri, 03 Oct 2003 12:32:00 -0400
> Tom Lane <[EMAIL PROTECTED]> wrote:
> 
> => What exactly is failing?  And what's the platform, anyway?
> 
> Nothing is really failing atm, except the funds for better 
> hardware. JBOSS and some other servers need to be 
> run on these machines, along with linux, which will be 
> a minimal RH >= 7.2 with kernel 2.4.21
> (Any better suggestions here?)
> 
> In this case, whatever is the least amount of memory
> postgres can run on, is what is needed. So this is still
> a kind of feasibility study. Of course, it will still be thoroughly
> tested, if it turns out to be possible. (Which I know it is, but not how)

JBOSS, PostgreSQL & 2.4.21 all on a computer w/ 8MB RAM?  A 486 or
*very* low end Pentium?

It'll thrash (in the literal sense) the page files.  *No* work 
will get done.

-- 
-
Ron Johnson, Jr. [EMAIL PROTECTED]
Jefferson, LA USA

"...always eager to extend a friendly claw"


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Postgres low end processing.

2003-10-03 Thread Neil Conway

On Fri, 2003-10-03 at 14:08, Josh Berkus wrote:
> I can tell you from experience that you will get some odd behaviour, and even 
> connection failures, when Postgres is forced into swap by lack of memory.

Why would you get a connection failure? And other than poor performance,
why would you get "odd behavior" due to a lack of physical memory?

-Neil



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Postgres low end processing.

2003-10-03 Thread scott.marlowe

On Fri, 3 Oct 2003, Ron Johnson wrote:

> On Fri, 2003-10-03 at 12:52, Stef wrote:
> > On Fri, 03 Oct 2003 12:32:00 -0400
> > Tom Lane <[EMAIL PROTECTED]> wrote:
> > 
> > => What exactly is failing?  And what's the platform, anyway?
> > 
> > Nothing is really failing atm, except the funds for better 
> > hardware. JBOSS and some other servers need to be 
> > run on these machines, along with linux, which will be 
> > a minimal RH >= 7.2 with kernel 2.4.21
> > (Any better suggestions here?)
> > 
> > In this case, whatever is the least amount of memory
> > postgres can run on, is what is needed. So this is still
> > a kind of feasibility study. Of course, it will still be thoroughly
> > tested, if it turns out to be possible. (Which I know it is, but not how)
> 
> JBOSS, PostgreSQL & 2.4.21 all on a computer w/ 8MB RAM?  A 486 or
> *very* low end Pentium?
> 
> It'll thrash (in the literal sense) the page files.  *No* work 
> will get done.

I built a test server four years ago on a P100 with 64 Megs of RAM and it 
was already a pretty slow / old box at that time.

Considering that those kind of beasts sell by the pound nowadays, I can't 
imagine torturing yourself by using a 486 with 8 megs of ram.  Even my 
ancient 486DX50 Toshiba 4700 has 16 Megs of ram in it.

IF ons has to develop in such a low end environment you're much better 
off either writing perl CGI or using PHP, which both use much less memory 
than JBoss.

I don't think I'd try to run JBoss / Postgresql on anything less than 64 
or 128 Meg of RAM.  Even then you're probably looking at having a fair bit 
of swapping going on.

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Postgres low end processing.

2003-10-03 Thread Richard Welty

On Fri, 03 Oct 2003 11:42:54 -0400 Tom Lane <[EMAIL PROTECTED]> wrote:
> "Postgres is bloatware by design: it was built to house PhD theses."
> -- J. Hellerstein (who ought to know)

if postgres is bloatware, what is oracle 9i?

(after i downloaded a copy of oracle 8i a couple of months back, i swore i'd
never complain about the size of postgresql ever ever again.)

richard
-- 
Richard Welty [EMAIL PROTECTED]
Averill Park Networking 518-573-7592
Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Joins on inherited tables

2003-10-03 Thread apb18

OK, so I've had a bit of time to look things over, and appear to be 
making headway.  Here's how things stand right now:

I added a function called best_inner_scan used the same way as 
best_inner_indexscan, but it's a bit more generalized in the sense that 
it can make append plans comprising of the best scans for each 
constituent table (it makes calls to best_inner_indexscan for each 
child), or just return the best simple index scan (or null) for plain 
relations.

In order to make that work, I gave the child tables modified join 
clauses from the parent... modified in the sense that I had to make the 
operands match the inherited child table (they would match the parent 
otherwise if I were to simply copy the Joininfo nodes, and thus fail in 
finding an appropriate index in the child table).  I'm not entirely 
comfortable with that solution yet, as I'm not absolutely certain those 
additonal modified join clauses wont' affect something else in the code 
that I'm not aware of,  but it appears to be having the desired effect.

Basically, with optimizer debug enabled, I'm getting plans that look like 
this (with the same queries as before) 

RELOPTINFO (1 2): rows=501 width=19
cheapest total path:
NestLoop(1 2) rows=501 cost=0.00..1253.67
  clauses: numbers.id = ids.id
SeqScan(1) rows=1 cost=0.00..0.02
Append(2)  rows=100051 cost=0.00..3.01

As opposed to this:

RELOPTINFO (1 2): rows=501 width=19
cheapest total path:
HashJoin(1 2) rows=501 cost=0.00..2195.79
  clauses: numbers.id = ids.id
SeqScan(1) rows=1 cost=0.00..0.02
Append(2)  rows=100051 cost=0.00..1690.50

The total cost seems a high for the nestloop.. its constituents are 
certainly cheap. I need to look to see if I missed keeping track of 
costs somewhere.

When I EXPLAIN, though, I get an error from the executor:
"ERROR: both left and right operands are rel-vars".  I haven't looked 
into that yet, but the results so far are encouraging enough to press on 
and get this completed.  

There was one hairy part, though, which will have to be addressed at some 
later point:  Right now there is a boolean 'inh' in the RangeTblEntry 
struct which indicates "inheritance requested".  When the inheritance root 
is first expanded by expand_inherited_rtentry(), the rte->inh is 
nulled in order to prevent expansion of an UPDATE/DELETE target.  This 
presented problems for me when I wanted to detect which relation was an 
inheritor one in order to expand it into the append path.  For my testing 
purposes, I just commented out the line, but for a real 
solution, that's not an acceptable solution and some struct might have to 
be changed slightly in order to convey the inheritance knowledge.. 

So, I guess the next step is to see what the executor is complaining 
about and see if it's something that would need attention in the executor 
of if it's something I did wrong.. If everything appears to work after 
that point, then I'll check for efficiency and use of cache in generating 
the inner scan plans.

Thanks for the advice and historical perspective so far, Tom. It has been 
quite helpful.  

-Aaron


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[PERFORM] Speeding up Aggregates

2003-10-03 Thread Dror Matalon

Hi,

I have a query that ran quite well initially, but slowed down quite a
bit once I introduced an aggregate into the equation. The average
execution time went up from around 15 msec to around 300 msec. 

The original query fetches a bunch of articles:

select articlenumber, channel, description, title, link, dtstamp  from
items, my_channels where items.channel = '2' and my_channels.id =
'2' and owner = 'drormata'  and dtstamp > last_viewed and
articlenumber not in (select item from viewed_items where channel
='2' and owner = 'drormata');


I then added a call to a function:

and (dtstamp = item_max_date(2, link))


item_max_date() looks like this:
   select max(dtstamp) from items where channel = $1 and link = $2;

This should eliminate duplicate articles and only show the most recent
one.

resulting in the following query

select articlenumber, channel, description, title, link, dtstamp  from
items, my_channels where items.channel = '2' and my_channels.id =
'2' and owner = 'drormata'  and dtstamp > last_viewed and
articlenumber not in (select item from viewed_items where channel
='2' and owner = 'drormata') and (dtstamp = item_max_date(2,
link));



Any suggestions on optimizing the query/function? It makes sense that 
it slowed down, but I wonder if I can do better.

I'm including index list as well as "explain analyze" of both versions.

Indexes:
"item_channel_link" btree (channel, link)
"item_created" btree (dtstamp)
"item_signature" btree (signature)
"items_channel_article" btree (channel, articlenumber)


explain analyze select articlenumber, channel, description, title, link, dtstamp  from 
items, my_channels where items.channel = '2' and my_channels.id = '2' and 
owner = 'drormata'  and dtstamp > last_viewed and articlenumber not in (select item 
from viewed_items where channel ='2' and owner = 'drormata');  
 QUERY PLAN
---
 Nested Loop  (cost=8.19..6982.58 rows=302 width=259) (actual time=16.95..17.16 rows=8 
loops=1)
   Join Filter: ("inner".dtstamp > "outer".last_viewed)
   ->  Seq Scan on my_channels  (cost=0.00..3.23 rows=1 width=8) (actual 
time=0.36..0.38 rows=1 loops=1)
 Filter: ((id = 2) AND (("owner")::text = 'drormata'::text))
   ->  Index Scan using items_channel_article on items  (cost=8.19..6968.05 rows=904 
width=259) (actual time=0.68..13.94 rows=899 loops=1)
 Index Cond: (channel = 2)
 Filter: (NOT (hashed subplan))
 SubPlan
   ->  Seq Scan on viewed_items  (cost=0.00..8.19 rows=2 width=4) (actual 
time=0.48..0.48 rows=0 loops=1)
 Filter: ((channel = 2) AND (("owner")::text = 'drormata'::text))
 Total runtime: 17.42 msec
(11 rows)


explain analyze select articlenumber, channel, description, title, link, dtstamp  from 
items, my_channels where items.channel = '2' and my_channels.id = '2' and 
owner = 'drormata'  and dtstamp > last_viewed and articlenumber not in (select item 
from viewed_items where channel ='2' and owner = 'drormata') and (dtstamp = 
item_max_date(2, link));
QUERY PLAN
--
 Nested Loop  (cost=8.19..6980.33 rows=1 width=259) (actual time=262.94..265.14 rows=7 
loops=1)
   Join Filter: ("outer".dtstamp > "inner".last_viewed)
   ->  Index Scan using items_channel_article on items  (cost=8.19..6977.08 rows=1 
width=259) (actual time=1.94..150.55 rows=683 loops=1)
 Index Cond: (channel = 2)
 Filter: ((dtstamp = item_max_date(2, link)) AND (NOT (hashed subplan)))
 SubPlan
   ->  Seq Scan on viewed_items  (cost=0.00..8.19 rows=2 width=4) (actual 
time=0.43..0.43 rows=0 loops=1)
 Filter: ((channel = 2) AND (("owner")::text = 'drormata'::text))
   ->  Seq Scan on my_channels  (cost=0.00..3.23 rows=1 width=8) (actual 
time=0.14..0.15 rows=1 loops=683)
 Filter: ((id = 2) AND (("owner")::text = 'drormata'::text))
 Total runtime: 265.39 msec




-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 8: explain analyze is your friend

[PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Rob Nagler

I've read some posts that says vacuum doesn't lock, but my experience
today indicates the opposite.  It seemed that "vacuum full analyze"
was locked waiting and so were other postmaster processes.  It
appeared to be deadlock, because all were in "WAITING" state according
to ps.  I let this go for about a 1/2 hour, and then killed the vacuum
at which point all other processes completed normally.

The same thing seemed to be happening with reindex on a table.  It
seems that the reindex locks the table and some other resource which
then causes deadlock with other active processes.

Another issue seems to be performance.  A reindex on some indexes is
taking 12 minutes or so.  Vacuum seems to be slow, too.  Way longer
than the time it takes to reimport the entire database (30 mins).

In summary, I suspect that it is better from a UI perspective to bring
down the app on Sat at 3 a.m and reimport with a fixed time period
than to live through reindexing/vacuuming which may deadlock.  Am I
missing something?

Thanks,
Rob



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Tom Lane

Rob Nagler <[EMAIL PROTECTED]> writes:
> I've read some posts that says vacuum doesn't lock, but my experience
> today indicates the opposite.  It seemed that "vacuum full analyze"
> was locked waiting and so were other postmaster processes.

vacuum full does require exclusive lock, plain vacuum does not.

> It
> appeared to be deadlock, because all were in "WAITING" state according
> to ps.  I let this go for about a 1/2 hour, and then killed the vacuum
> at which point all other processes completed normally.

It's considerably more likely that the vacuum was waiting for an open
client transaction (that had a read or write lock on some table) to
finish than that there was an undetected deadlock.  I suggest looking at
your client code.  Also, in 7.3 or later you could look at the pg_locks
view to work out exactly who has the lock that's blocking vacuum.

> Another issue seems to be performance.  A reindex on some indexes is
> taking 12 minutes or so.  Vacuum seems to be slow, too.  Way longer
> than the time it takes to reimport the entire database (30 mins).

vacuum full is indeed slow.  That's why we do not recommend it as a
routine maintenance procedure.  The better approach is to do plain
vacuums often enough that you don't need vacuum full.  In pre-7.4
releases you might need periodic reindexes too, depending on whether
your usage patterns tickle the index-bloat problem.  But it is easily
demonstrable that reindexing is cheaper than rebuilding the database.

> In summary, I suspect that it is better from a UI perspective to bring
> down the app on Sat at 3 a.m and reimport with a fixed time period
> than to live through reindexing/vacuuming which may deadlock.  Am I
> missing something?

Almost certainly, though you've not provided enough detail to determine
what.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Josh Berkus

Dror,

> select articlenumber, channel, description, title, link, dtstamp  from
>   items, my_channels where items.channel = '2' and my_channels.id =
>   '2' and owner = 'drormata'  and dtstamp > last_viewed and
>   articlenumber not in (select item from viewed_items where channel
>   ='2' and owner = 'drormata');

the NOT IN is a bad idea unless the subselect never returns more than a 
handful of rows.  If viewed_items can grow to dozens  of rows, wyou should 
use WHERE NOT EXISTS instead.  Unless you're using 7.4.

> item_max_date() looks like this:
>select max(dtstamp) from items where channel = $1 and link = $2;

Change it to 

SELECT dtstamp from iterm where channel = $1 and link = $2
ORDER BY dtstamp DESC LIMIT 1

and possibly build an index on channel, link, dtstamp

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Dror Matalon

Hi Josh,

On Fri, Oct 03, 2003 at 02:07:10PM -0700, Josh Berkus wrote:
> Dror,
> 
> > select articlenumber, channel, description, title, link, dtstamp  from
> > items, my_channels where items.channel = '2' and my_channels.id =
> > '2' and owner = 'drormata'  and dtstamp > last_viewed and
> > articlenumber not in (select item from viewed_items where channel
> > ='2' and owner = 'drormata');
> 
> the NOT IN is a bad idea unless the subselect never returns more than a 
> handful of rows.  If viewed_items can grow to dozens  of rows, wyou should 
> use WHERE NOT EXISTS instead.  Unless you're using 7.4.
> 

I am using 7.4, and had tried NOT EXISTS and didn't see any
improvements.

> > item_max_date() looks like this:
> >select max(dtstamp) from items where channel = $1 and link = $2;
> 
> Change it to 
> 
> SELECT dtstamp from iterm where channel = $1 and link = $2
> ORDER BY dtstamp DESC LIMIT 1
> 

Didn't make a difference. And plugging real values into this query as
well as into  the original 
  select max(dtstamp) from items where channel = $1 and link = $2;

and doing an explain analyze shows that the cost is the same. The
strange things is that when I run the above queries by hand they take
about .5 msec. Yet on a resultset that fetches 5 rows, I go up from 15
msec to 300 msec. It would seem like it should be something like 15 +
(0.5 * 5) + small overhead, = 30 msec or so rather than the 300 I'm
seeing.

> and possibly build an index on channel, link, dtstamp

Didn't make a difference either. Explain analyze shows that it didn't
use it.

> 
> -- 
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Josh Berkus

Dror,

> I am using 7.4, and had tried NOT EXISTS and didn't see any
> improvements.

It wouldn't if you're using 7.4, which has improved IN performance immensely.

What happens if you stop using a function and instead use a subselect?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Rod Taylor

> item_max_date() looks like this:
>select max(dtstamp) from items where channel = $1 and link = $2;

It is too bad the (channel, link) index doesn't have dtstamp at the end
of it, otherwise the below query would be a gain (might be a small one
anyway).

  select dtstamp
from items
   where channel = $1
 and link = $2
ORDER BY dtstamp DESC
   LIMIT 1;


Could you show us the exact specification of the function?  In
particular, did you mark it VOLATILE, IMMUTABLE, or STABLE?

I hope it isn't the first or second one ;)


signature.asc
Description: This is a digitally signed message part

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Rob Nagler

> vacuum full does require exclusive lock, plain vacuum does not.

I think I need full, because there are updates on the table.  As I
understand it, an update in pg is an insert/delete, so it needs
to be garbage collected.

> It's considerably more likely that the vacuum was waiting for an open
> client transaction (that had a read or write lock on some table) to
> finish than that there was an undetected deadlock.  I suggest looking at
> your client code.  Also, in 7.3 or later you could look at the pg_locks
> view to work out exactly who has the lock that's blocking vacuum.

My client code does a lot.  I look at more often than I'd like to. :-) 

I don't understand why the client transaction would block if vacuum
was waiting.  Does vacuum lock the table and then try to get some
other "open transaction" resource?  Free space?  I guess I don't
understand what other resources would be required of vacuum.  The
client transactions are short (< 1s).  They don't deadlock normally,
only with reindex and vacuum did I see this behavior.

> vacuum full is indeed slow.  That's why we do not recommend it as a
> routine maintenance procedure.  The better approach is to do plain
> vacuums often enough that you don't need vacuum full.

The description of vacuum full implies that is required if the db
is updated frequently.   This db gets about 1 txn a second, possibly
more at peak load.

> In pre-7.4
> releases you might need periodic reindexes too, depending on whether
> your usage patterns tickle the index-bloat problem.

7.3, and yes, we have date indexes as well as sequences for primary
keys.
  
> But it is easily
> demonstrable that reindexing is cheaper than rebuilding the database.

IOW, vacuum+reindex is faster than dump+restore?  I didn't see this,
then again, I had this locking problem, so the stats are distorted.

One other question: The reindex seems to lock the table for the entire
process as opposed to freeing the lock between index rebuilds.  It was
hard to see, but it seemed like the clients were locked for the entire
"reindex table bla" command.

Sorry for lack of detail, but I didn't expect these issues so I wasn't
keeping track of the system state as closely as I should have.  Next
time. :-)

Thanks,
Rob

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Bruno Wolff III

On Fri, Oct 03, 2003 at 15:47:01 -0600,
  Rob Nagler <[EMAIL PROTECTED]> wrote:
> > vacuum full does require exclusive lock, plain vacuum does not.
> 
> I think I need full, because there are updates on the table.  As I
> understand it, an update in pg is an insert/delete, so it needs
> to be garbage collected.

Plain vacuum will mark the space used by deleted tuples as reusable.
Most of the time this is good enough and you don't need to run vacuum full.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Dror Matalon

On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > item_max_date() looks like this:
> >select max(dtstamp) from items where channel = $1 and link = $2;
> 
> It is too bad the (channel, link) index doesn't have dtstamp at the end
> of it, otherwise the below query would be a gain (might be a small one
> anyway).
> 
>   select dtstamp
> from items
>where channel = $1
>  and link = $2
> ORDER BY dtstamp DESC
>LIMIT 1;

Similar idea to what Josh suggested. I did create an additional index
with dtstamp at the end and it doesn't look like the planner used it.
Using the above query instead of max() didn't improve things either.

> 
> 
> Could you show us the exact specification of the function?  In
> particular, did you mark it VOLATILE, IMMUTABLE, or STABLE?
> 
> I hope it isn't the first or second one ;)

CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2;
' LANGUAGE 'sql';




-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Dror Matalon

On Fri, Oct 03, 2003 at 02:35:46PM -0700, Josh Berkus wrote:
> Dror,
> 
> > I am using 7.4, and had tried NOT EXISTS and didn't see any
> > improvements.
> 
> It wouldn't if you're using 7.4, which has improved IN performance immensely.
> 
> What happens if you stop using a function and instead use a subselect?

An improvement. Now I'm getting in the 200 msec response time. 

And by the way, I tried "not exists" again and it actually runs slower
than "not in."

> 
> -- 
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
> 
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Rod Taylor

On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
> On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > > item_max_date() looks like this:
> > >select max(dtstamp) from items where channel = $1 and link = $2;
> > 
> > It is too bad the (channel, link) index doesn't have dtstamp at the end
> > of it, otherwise the below query would be a gain (might be a small one
> > anyway).
> > 
> >   select dtstamp
> > from items
> >where channel = $1
> >  and link = $2
> > ORDER BY dtstamp DESC
> >LIMIT 1;

It didn't make a difference even with the 3 term index? I guess you
don't have very many common values for channel / link combination.



How about the below? Note the word STABLE on the end.

CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2;
' LANGUAGE 'sql' STABLE;


signature.asc
Description: This is a digitally signed message part

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Neil Conway

On Fri, 2003-10-03 at 17:47, Rob Nagler wrote:
> They don't deadlock normally,
> only with reindex and vacuum did I see this behavior.

If you can provide a reproducible example of a deadlock induced by
REINDEX + VACUUM, that would be interesting.

(FWIW, I remember noticing a potential deadlock in the REINDEX code and
posting to -hackers about it, but I've never seen it occur in a
real-world situation...)

> One other question: The reindex seems to lock the table for the entire
> process as opposed to freeing the lock between index rebuilds.

Yeah, I wouldn't be surprised if there is some room for optimizing the
locks that are acquired by REINDEX.

-Neil



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Dror Matalon

On Fri, Oct 03, 2003 at 06:10:29PM -0400, Rod Taylor wrote:
> On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
> > On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > > > item_max_date() looks like this:
> > > >select max(dtstamp) from items where channel = $1 and link = $2;
> > > 
> > > It is too bad the (channel, link) index doesn't have dtstamp at the end
> > > of it, otherwise the below query would be a gain (might be a small one
> > > anyway).
> > > 
> > >   select dtstamp
> > > from items
> > >where channel = $1
> > >  and link = $2
> > > ORDER BY dtstamp DESC
> > >LIMIT 1;
> 
> It didn't make a difference even with the 3 term index? I guess you
> don't have very many common values for channel / link combination.

There's no noticeable difference between two term and three term
indexes.

> 
> 
> 
> How about the below? Note the word STABLE on the end.
> 
> CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
> timestamptz AS '
> select max(dtstamp) from items where channel = $1 and link = $2;
> ' LANGUAGE 'sql' STABLE;

Made no difference.



-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Christopher Browne

[EMAIL PROTECTED] (Rob Nagler) writes:
> I've read some posts that says vacuum doesn't lock, but my experience
> today indicates the opposite.  It seemed that "vacuum full analyze"
> was locked waiting and so were other postmaster processes.  It
> appeared to be deadlock, because all were in "WAITING" state according
> to ps.  I let this go for about a 1/2 hour, and then killed the vacuum
> at which point all other processes completed normally.

VACUUM FULL certainly does lock.

See the man page:

   INPUTS
   FULL   Selects ``full'' vacuum, which may reclaim more space, but takes
  much longer and exclusively locks the table.

The usual answer is that you probably _didn't_ want to VACUUM FULL.

VACUUM ('no full') does NOT block updates.

> The same thing seemed to be happening with reindex on a table.  It
> seems that the reindex locks the table and some other resource which
> then causes deadlock with other active processes.

Not surprising either.  While the reindex takes place, updates to that
table have to be deferred.

> Another issue seems to be performance.  A reindex on some indexes is
> taking 12 minutes or so.  Vacuum seems to be slow, too.  Way longer
> than the time it takes to reimport the entire database (30 mins).

That seems a little surprising.

> In summary, I suspect that it is better from a UI perspective to
> bring down the app on Sat at 3 a.m and reimport with a fixed time
> period than to live through reindexing/vacuuming which may deadlock.
> Am I missing something?

Consider running pg_autovacuum, and thereby do a little bit of
vacuuming here and there all the time.  It DOESN'T block, so unless
your system is really busy, it shouldn't slow things down to a major
degree.
-- 
"cbbrowne","@","libertyrms.info"

Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] count(*) slow on large tables

2003-10-03 Thread Christopher Browne

In the last exciting episode, [EMAIL PROTECTED] (Jean-Luc Lachance) wrote:
> Well I can think of many more case where it would be usefull:
>
> SELECT COUNT(DISTINCT x) FROM ...
> SELECT COUNT(*) FROM ... WHERE x = ?

Those are precisely the cases that the "other databases" ALSO fall
down on.

Maintaining those sorts of statistics would lead [in _ANY_ database;
PostgreSQL has no disadvantage in this] to needing for each and every
update to update a whole host of statistic values.

It would be fairly reasonable to have a trigger, in PostgreSQL, to
manage this sort of information.  It would not be outrageously
difficult to substantially improve performance of queries, at the
considerable cost that each and every update would have to update a
statistics table.

If you're doing a whole lot of these sorts of queries, then it is a
reasonable idea to create appropriate triggers for the (probably very
few) tables where you are doing these counts.

But the notion that this should automatically be applied to all tables
always is a dangerous one.  It would make update performance Suck
Badly, because the extra statistical updates would be quite expensive.
-- 
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','cbbrowne.com').
http://www3.sympatico.ca/cbbrowne/multiplexor.html
I'm sorry Dave, I can't let you do that.
Why don't you lie down and take a stress pill?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Matt Clark

> > vacuum full does require exclusive lock, plain vacuum does not.
>
> I think I need full, because there are updates on the table.  As I
> understand it, an update in pg is an insert/delete, so it needs
> to be garbage collected.

Yes and no.  You only need a plain VACUUM that is run often enough to
recover space as fast as you need to grab it.  For heavily updated tables
run it often - I run it every 5 minutes on some tables.  A VACUUM FULL is
only needed if you haven't been running VACUUM often enough in the first
place.

> The description of vacuum full implies that is required if the db
> is updated frequently.   This db gets about 1 txn a second, possibly
> more at peak load.

Assuming you mean 1 update/insert per second that is an absolutely _trivial_
load on any reasonable hardware.  You can do thousands of updates/second on
hardware costing less than $2000.  If you vacuum every hour then you will be
fine.

> IOW, vacuum+reindex is faster than dump+restore?  I didn't see this,
> then again, I had this locking problem, so the stats are distorted.

REINDEX also locks tables like VACUUM FULL.  Either is terribly slow, but
unless you turn off fsync during the restore it's unlikely to be slower than
dump & restore.

Matt


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Josh Berkus

Rob,

> > I think I need full, because there are updates on the table.  As I
> > understand it, an update in pg is an insert/delete, so it needs
> > to be garbage collected.
> 
> Yes and no.  You only need a plain VACUUM that is run often enough to
> recover space as fast as you need to grab it.  For heavily updated tables
> run it often - I run it every 5 minutes on some tables.  A VACUUM FULL is
> only needed if you haven't been running VACUUM often enough in the first
> place.

Also, if you find that you need to run VACUUM FULL often, then you need to 
raise your max_fsm_pages.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Matt Clark

> > In summary, I suspect that it is better from a UI perspective to
> > bring down the app on Sat at 3 a.m and reimport with a fixed time
> > period than to live through reindexing/vacuuming which may deadlock.
> > Am I missing something?
>
> Consider running pg_autovacuum, and thereby do a little bit of
> vacuuming here and there all the time.  It DOESN'T block, so unless
> your system is really busy, it shouldn't slow things down to a major
> degree.

My real world experience on a *very* heavily updated OLTP type DB, following
advice from this list (thanks guys!), is that there is essentially zero cost
to going ahead and vacuuming as often as you feel like it.  Go crazy, and
speed up your DB!

OK, that's on a quad CPU box with goodish IO, so maybe there are issues on
very slow boxen, but in a heavy-update environment the advantages seem to
easily wipe out the costs.

Matt

p.s.  Sorry to sound like a "Shake'n'Vac" advert.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Speeding up Aggregates

2003-10-03 Thread Rod Taylor


> > I hope it isn't the first or second one ;)
> 
> CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
> timestamptz AS '
> select max(dtstamp) from items where channel = $1 and link = $2;
> ' LANGUAGE 'sql';


How about the below?

CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2;
' LANGUAGE 'sql' STABLE;


signature.asc
Description: This is a digitally signed message part

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Matt Clark

> Also, if you find that you need to run VACUUM FULL often, then
> you need to
> raise your max_fsm_pages.

Yes and no.  If it's run often enough then the number of tracked pages
shouldn't need to be raised, but then again...

...max_fsm_pages should be raised anyway.  I'm about to reclaim a Pentium
166 w/ 64MB of RAM from a friend I lent it to _many_ years ago, and I
suspect PG would run happily on it as configured by default.  Set it to at
least 50,000 I say.  What do you have to lose, I mean if they're not free
then they're not tracked in the FSM right?

Of course if anyone knows a reason _not_ to raise it then I'm all ears!

Matt


>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>
>
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/docs/faqs/FAQ.html
>


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Neil Conway

On Fri, 2003-10-03 at 17:34, Christopher Browne wrote:
> Not surprising either.  While the reindex takes place, updates to that
> table have to be deferred.

Right, but that's no reason not to let SELECTs proceed, for example.
(Whether that would actually be *useful* is another question...)

-Neil

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Matt Clark

> > Also, if you find that you need to run VACUUM FULL often, then
> > you need to
> > raise your max_fsm_pages.
>
> Yes and no.  If it's run often enough then the number of tracked pages
> shouldn't need to be raised, but then again...

Oops, sorry, didn't pay attention and missed the mention of FULL.  My bad,
ignore my OT useless response.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Bruce Momjian


I have updated the FAQ to be:

  In comparison to MySQL or leaner database systems, we are
  faster for multiple users, complex queries, and a read/write query
  load.  MySQL is faster for SELECT queries done by a few users. 

Is this accurate?  It seems so.

---

Oleg Lebedev wrote:
> Jeff,
> I would really appreciate if you could send me that lengthy presentation
> that you've written on pg/other dbs comparison.
> Thanks.
> 
> Oleg
> 
> -Original Message-
> From: Jeff [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 01, 2003 6:23 AM
> To: David Griffiths
> Cc: [EMAIL PROTECTED]
> Subject: Re: [PERFORM] Tuning/performance issue...
> Importance: Low
> 
> 
> On Tue, 30 Sep 2003, David Griffiths wrote:
> 
> >
> > This is all part of a "migrate away from Oracle" project. We are 
> > looking at 3 databases - MySQL (InnoDB), Postgres and Matisse (object 
> > oriented). We have alot of queries like this
> > or worse, and I'm worried that many of them would need to be
> re-written. The
> > developers
> > know SQL, but nothing about tuning, etc.
> >
> 
> There's a movement at my company to ditch several commercial db's in
> favor of a free one.  I'm currently the big pg fan around here and I've
> actually written a rather lengthy presentation about pg features, why,
> tuning, etc. but another part was some comparisons to other db's..
> 
> I decided so I wouldn't be blinding flaming mysql to give it a whirl and
> loaded it up with the same dataset as pg.  First thing I hit was lack of
> stored procedures.   But I decided to code around that, giving mysql the
> benefit of the doubt.  What I found was interesting.
> 
> For 1-2 concurrent
> 'beaters' it screamed. ultra-fast.  But.. If you increase the concurrent
> beaters up to say, 20 Mysql comes to a grinding halt.. Mysql and the
> machine itself become fairly unresponsive.  And if you do cache
> unfriendly
> queries it becomes even worse.   On PG - no problems at all. Scaled fine
> and dandy up.  And with 40 concurrent beaters the machine was still
> responsive.  (The numbers for 20 client was 220 seconds (pg) and 650
> seconds (mysql))
> 
> So that is another test to try out - Given your configuration I expect
> you have lots of concurrent activity.
> 
> --
> Jeff Trout <[EMAIL PROTECTED]>
> http://www.jefftrout.com/
> http://www.stuarthamm.net/
> 
> 
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>   joining column's datatypes do not match
> 
> *
> 
> This e-mail may contain privileged or confidential material intended for the named 
> recipient only.
> If you are not the named recipient, delete this message and all attachments.
> Unauthorized reviewing, copying, printing, disclosing, or otherwise using 
> information in this e-mail is prohibited.
> We reserve the right to monitor e-mail sent through our network. 
> 
> *
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Rod Taylor

On Fri, 2003-10-03 at 21:39, Bruce Momjian wrote:
> I have updated the FAQ to be:
> 
>   In comparison to MySQL or leaner database systems, we are
>   faster for multiple users, complex queries, and a read/write query
>   load.  MySQL is faster for SELECT queries done by a few users. 
> 
> Is this accurate?  It seems so.

May wish to say ... for simple SELECT queries ...

Several left outer joins, subselects and a large number of joins are
regularly performed faster in PostgreSQL due to a more mature optimizer.

But MySQL can pump out SELECT * FROM table WHERE key = value; queries in
a hurry.

I've often wondered if they win on those because they have a lighter
weight parser / optimizer with less "lets try simplifying this query"
steps or if the MYISAM storage mechanism is simply quicker at pulling
data off the disk.

signature.asc
Description: This is a digitally signed message part

Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Bruce Momjian

Rod Taylor wrote:
-- Start of PGP signed section.
> On Fri, 2003-10-03 at 21:39, Bruce Momjian wrote:
> > I have updated the FAQ to be:
> > 
> >   In comparison to MySQL or leaner database systems, we are
> >   faster for multiple users, complex queries, and a read/write query
> >   load.  MySQL is faster for SELECT queries done by a few users. 
> > 
> > Is this accurate?  It seems so.
> 
> May wish to say ... for simple SELECT queries ...

Updated.

> Several left outer joins, subselects and a large number of joins are
> regularly performed faster in PostgreSQL due to a more mature optimizer.
> 
> But MySQL can pump out SELECT * FROM table WHERE key = value; queries in
> a hurry.
> 
> 
> I've often wondered if they win on those because they have a lighter
> weight parser / optimizer with less "lets try simplifying this query"

I think that is part of it.

> steps or if the MYISAM storage mechanism is simply quicker at pulling
> data off the disk.

And their heap is indexed by myisam, right. I know with Ingres that Isam
was usually faster than btree because you didn't have all those leaves
to traverse to get to the data.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Christopher Browne

Centuries ago, Nostradamus foresaw when [EMAIL PROTECTED] (Bruce Momjian) would write:
> I have updated the FAQ to be:
>
>   In comparison to MySQL or leaner database systems, we are
>   faster for multiple users, complex queries, and a read/write query
>   load.  MySQL is faster for SELECT queries done by a few users. 
>
> Is this accurate?  It seems so.

I would think it more accurate if you use the phrase "faster for
simple SELECT queries."

MySQL uses a rule-based optimizer which, when the data fits the rules
well, can pump queries through lickety-split without any appreciable
pause for evaluation (or reflection :-).  That's _quite_ a successful
strategy when users are doing what loosely amounts to evaluating
association tables.

select * from table where key = value;

Which is just like tying a Perl variable to a hash table, and doing
   $value = $TABLE{$key};

In web applications where they wanted something a _little_ more
structured than hash tables, that may 'hit the spot.'

Anything hairier than that gets, of course, hairier.  If you want
something that's TRULY more structured, you may lose a lot of hair
:-).
-- 
output = reverse("gro.gultn" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/oses.html
"If you want to talk with some experts about something, go to the bar
where they hang out, buy a round of beers, and they'll surely talk
your ear off, leaving you wiser than before.

If you, a stranger, show up at the bar, walk up to the table, and ask
them to fax you a position paper, they'll tell you to call their
office in the morning and ask for a rate sheet." -- Miguel Cruz

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Tom Lane

Neil Conway <[EMAIL PROTECTED]> writes:
> On Fri, 2003-10-03 at 17:34, Christopher Browne wrote:
>> Not surprising either.  While the reindex takes place, updates to that
>> table have to be deferred.

> Right, but that's no reason not to let SELECTs proceed, for example.

What if said SELECTs are using the index in question?

I suspect it is true that REINDEX locks more than it needs to, but we
should tread carefully about loosening it.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Tom Lane

Rod Taylor <[EMAIL PROTECTED]> writes:
> I've often wondered if they win on those because they have a lighter
> weight parser / optimizer with less "lets try simplifying this query"
> steps or if the MYISAM storage mechanism is simply quicker at pulling
> data off the disk.

Comparing pre-PREPAREd queries would probably tell something about that.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match