date:20031217

Re: [PERFORM] Nested loop question

2003-12-17 Thread Richard Huxton

On Tuesday 16 December 2003 17:06, Nick Fankhauser - Doxpop wrote:
> Hi-
>
> I'm trying to optimize a query that I *think* should run very fast.
> Essentially, I'm joining two tables that have very selective indexes and
> constraining the query on an indexed field. (There's a third small lookup
> table in the mix, but it doesn't really affect the bottom line.)

> I'm unsure what is happening next. I notice that an index scan is occurring
> on actor_summary_pk, with an "actual time" of 9.15, but then it looks like
> a nested loop occurs at the next level to join these tables. Does this mean
> that each probe of the actor_summary index will take 9.15 msec, but the
> nested loop is going to do this once for each actor_id?

That's right - you need to multiply the actual time by the number of loops. In 
your case this would seem to be about 33 seconds.

>  ->  Index Scan using actor_summary_pk on
> actor_summary  (cost=0.00..8.11 rows=1 width=72) (actual time=9.14..9.15
> rows=1 loops=3639)
>Index Cond: ("outer".actor_id =
> actor_summary.actor_id)

> The nested loop appears to be where most of my time is going, so I'm
> focusing on this area, but don't know if there is a better approach to this
> join.
>
> Is there a more efficient means than a nested loop to handle such a join?
> Would a different method be chosen if there was exactly one row in
> actor_summary for every row in actor?

Hmm - tricky to say in your case. PG has decided to filter on actor then look 
up the corresponding values in actor_summary. Given that you have 3 million 
rows in both tables that seems a reasonable approach. You could always try 
forcing different plans by switching the various ENABLE_HASHJOIN etc options 
(see the runtime configuration section of the manuals). I'm not sure that 
will help you here though.

The fact that it's taking you 9ms to do each index lookup suggests to me that 
it's going to disk each time. Does that sound plausible, or do you think you 
have enough RAM to cache your large indexes?

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Nested loop question

2003-12-17 Thread Nick Fankhauser


> The fact that it's taking you 9ms to do each index lookup
> suggests to me that
> it's going to disk each time. Does that sound plausible, or do
> you think you
> have enough RAM to cache your large indexes?

I'm sure we don't have enough RAM to cache all of our large indexes, so your
supposition makes sense. We have 1GB on this machine. In responding to the
performance problems we're having, one of the questions has been adding
memory vs crafting "helper" tables to speed things up. The issue is that
this database needs to be able to scale easily to about 10 times the size,
so although we could easily triple the memory at reasonable expense, we'd
eventually hit a wall.

Is there any solid method to insure that a particular index always resides
in memory? A hybrid approach that might scale reliably would be to bump up
our memory and then make sure key indexes are cached. however, I'm concerned
that if we didn't have a way to ensure that the indexes that we choose
remain cached, we would have very inconsistent responses.

Thanks for your ideas!

-Nick




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Nested loop performance

2003-12-17 Thread Nick Fankhauser

> It seems that your basic problem is that you're fetching lots of rows
> from two big ol' tables.

> It doesn't seem to me that there would be a substantially better plan
> for this query with your tables as they stand.

That's more or less the conclusion I had come to. I was just hoping someone
else could point out an approach I've been missing. (sigh!)



> If your data were more
> normalised, then your big scans might be quicker (because their rows
> would be smaller so they would hit fewer disk pages),

This started off as a 5-table join on well-normalized data. Unfortunately,
the actor table doesn't get any smaller, and the work involved in
calculating the "case_count" information on the fly was clearly becoming a
problem- particularly with actors that had a heavy caseload. (Busy attorneys
and judges.) The actor_summary approach makes these previous problem cases
go away, but the payback is that (as you correctly pointed out) queries on
average citizens who only have one case suffer from the de-normalized
approach.

We're currently considering the approach of just returning all of the rows
to our application, and doing the aggregation and limit work in the app. The
inconsistency of the data makes it very tough for the query planner to come
up with an strategy that is always a winner.

Thanks for your thoughts!

-Nick



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Nested loop performance

2003-12-17 Thread Nick Fankhauser


> As a question, what does explain analyze give you if you
> set enable_nestloop=false; before trying the query?

Here are the results- It looks quite a bit more painful than the other plan,
although the wall time is in the same ballpark.

alpha=# explain analyze
alpha-#   select
alpha-# min(actor.actor_id) as actor_id,
alpha-# min(actor.actor_entity_type) as actor_entity_type,
alpha-# min(actor.role_class_code) as role_class_code,
alpha-# min(actor.actor_full_name) as actor_full_name,
alpha-# min(actor.actor_person_date_of_birth) as
actor_person_date_of_birth,
alpha-# min(actor.actor_entity_acronym) as actor_entity_acronym,
alpha-# min(actor.actor_person_last_name) as actor_person_last_name,
alpha-# min(actor.actor_person_first_name) as actor_person_first_name,
alpha-# min(actor.actor_person_middle_name) as actor_person_middle_name,
alpha-# min(actor.actor_person_name_suffix) as actor_person_name_suffix,
alpha-# min(actor.actor_person_place_of_birth) as
actor_person_place_of_birth,
alpha-# min(actor.actor_person_height) as actor_person_height,
alpha-# min(actor.actor_person_height_unit) as actor_person_height_unit,
alpha-# min(actor.actor_person_weight) as actor_person_weight,
alpha-# min(actor.actor_person_weight_unit) as actor_person_weight_unit,
alpha-# min(actor.actor_person_ethnicity) as actor_person_ethnicity,
alpha-# min(actor.actor_person_citizenship_count) as
actor_person_citizenship_count,
alpha-# min(actor.actor_person_hair_color) as actor_person_hair_color,
alpha-# min(actor.actor_person_scars_marks_tatto) as
actor_person_scars_marks_tatto,
alpha-# min(actor.actor_person_marital_status) as
actor_person_marital_status,
alpha-# min(actor.actor_alias_for_actor_id) as actor_alias_for_actor_id,
alpha-# min(to_char(data_source.source_last_update, 'MM/DD/ HH12:MI
AM TZ')) as last_update,
alpha-# min(actor_summary.single_case_public_id) as case_public_id,
alpha-# min(actor_summary.single_case_id) as case_id,
alpha-# sum(actor_summary.case_count)as case_count
alpha-#   from
alpha-# actor,
alpha-# actor_summary,
alpha-# data_source
alpha-#   where
alpha-# actor.actor_id = actor_summary.actor_id
alpha-# and data_source.source_id = actor.source_id
alpha-# and actor.actor_full_name_uppercase like upper('sanders%')
alpha-#   group by
alpha-# actor.actor_id
alpha-#   order by
alpha-# min(actor.actor_full_name_uppercase),
alpha-# case_count desc,
alpha-# min(actor_summary.case_disp_global_code)
alpha-#   limit
alpha-# 1000;

QUERY PLAN




--
 Limit  (cost=168919.98..168920.03 rows=20 width=548) (actual
time=91247.95..91249.05 rows=1000 loops=1)
   ->  Sort  (cost=168919.98..168920.03 rows=20 width=548) (actual
time=91247.95..91248.35 rows=1001 loops=1)
 Sort Key: min((actor.actor_full_name_uppercase)::text),
sum(actor_summary.case_count),
min((actor_summary.case_disp_global_code)::text)
 ->  Aggregate  (cost=168904.95..168919.54 rows=20 width=548)
(actual time=91015.00..91164.68 rows=3590 loops=1)
   ->  Group  (cost=168904.95..168905.95 rows=201 width=548)
(actual time=90999.87..91043.25 rows=3594 loops=1)
 ->  Sort  (cost=168904.95..168905.45 rows=201
width=548) (actual time=90999.83..91001.57 rows=3594 loops=1)
   Sort Key: actor.actor_id
   ->  Hash Join  (cost=903.08..168897.24 rows=201
width=548) (actual time=25470.63..90983.45 rows=3594 loops=1)
 Hash Cond: ("outer".actor_id =
"inner".actor_id)
 ->  Seq Scan on actor_summary
(cost=0.00..150715.43 rows=3455243 width=73) (actual time=8.03..52902.24
rows=3455243 loops=1)
 ->  Hash  (cost=902.57..902.57 rows=204
width=475) (actual time=25459.92..25459.92 rows=0 loops=1)
   ->  Hash Join  (cost=1.14..902.57
rows=204 width=475) (actual time=155.92..25451.25 rows=3639 loops=1)
 Hash Cond: ("outer".source_id =
"inner".source_id)
 ->  Index Scan using
actor_full_name_uppercase on actor  (cost=0.00..897.20 rows=223 width=463)
(actual time=144.93..25404.
10 rows=3639 loops=1)
   Index Cond:
((actor_full_name_uppercase >= 'SANDERS'::character varying) AND
(actor_full_name_uppercase < 'SANDERT'::
character varying))
   Filter:
(actor_full_name_uppercase ~~ 'SANDERS%'::text)
 ->  Hash  (cost=1.11..1.11
rows=11 width=12) (actual time=10.66..10.66 rows=0 loops=1)

Re: [PERFORM] Excessive rows/tuples seriously degrading query

2003-12-17 Thread Tom Lane

Hannu Krosing <[EMAIL PROTECTED]> writes:
> Chadwick, Russell kirjutas L, 13.12.2003 kell 00:40:
>> Can anyone explain why this table which has never had more than a
>> couple rows in it shows > 500k in the query planner even after running
>> vacuum full.

> It can be that there is an idle transaction somewhere that has locked a
> lot of rows (i.e. all your updates have been running inside the same
> transaction for hour or days)

In fact an old open transaction is surely the issue, given that the
VACUUM report shows a huge number of "kept" tuples:

>> INFO:  Pages 4773: Changed 1, reaped 767, Empty 0, New 0; Tup 613737: Vac 57620, 
>> Keep/VTL 613735/613713, UnUsed 20652, MinLen 52, MaxLen 52; Re-using: Free/Avail. 
>> Space 4322596/4322596; EndEmpty/Avail. Pages 0/4773.
>> CPU 9.11s/13.68u sec elapsed 22.94 sec.

"Keep" is the number of tuples that are committed dead but can't be
removed yet because there is some other open transaction that is old
enough that it should be able to see them if it looks.

Apparently the access pattern on this table is constant updates of the
two logical rows, leaving lots and lots of dead versions.  You need to
vacuum it more often to keep down the amount of deadwood, and you need
to avoid having very-long-running transactions open when you vacuum.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Adding RAM: seeking advice & warnings of hidden "gotchas"

2003-12-17 Thread Nick Fankhauser

Hi-

After having done my best to squeeze better performance out of our
application by tuning within our existing resources, I'm falling back on
adding memory as a short-term solution while we get creative for a long-term
fix. I'm curious about what experiences others have had with the process of
adding big chunks of RAM. In particular, if I'm trying to encourage the OS
to cache more of my index information in RAM, what sort of configuration
should I do at both the PostgreSQL and OS level?

In a slightly off-topic vein, I'd also like to hear about it if anyone knows
about any gotchas at the OS level that might become a problem.

The server is a dual processor Athlon 1.2GHz box with hardware SCSI RAID. It
currently has 1 GB RAM, and we're planning to add one GB more for a total of
2GB. The OS is Debian Linux Kernel 2.4.x, and we're on PostgreSQL v7.3.2

My current memory related settings are:

SHMMAX and SHMALL set to 128MB (OS setting)
shared buffers 8192 (64MB)
sort_mem 16384 (16MB)
effective_cache_size  65536 (512MB)


We support up to 70 active users, sharing a connection pool of 16
connections. Most of the queries center around 3 tables that are about 1.5
GB each.


Thanks.
   -Nick

-
Nick Fankhauser

[EMAIL PROTECTED]  Phone 1.765.965.7363  Fax 1.765.962.9788
doxpop - Court records at your fingertips - http://www.doxpop.com/



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Adding RAM: seeking advice & warnings of hidden "gotchas"

2003-12-17 Thread Eric Soroos

On Dec 17, 2003, at 11:57 AM, Nick Fankhauser wrote:

Hi-

After having done my best to squeeze better performance out of our
application by tuning within our existing resources, I'm falling back 
on
adding memory as a short-term solution while we get creative for a 
long-term
fix. I'm curious about what experiences others have had with the 
process of
adding big chunks of RAM. In particular, if I'm trying to encourage 
the OS
to cache more of my index information in RAM, what sort of 
configuration
should I do at both the PostgreSQL and OS level?
You need bigmem compiled in the kernel, which you should already have 
at the 1 gig level iirc.
You should bump up your effective cache size, probably to around 1.75 
gig.

I wouldn't bump up the shared buffers beyond where you have them now. 
If you're swapping out sorts to disk, you may gain boosting sortmem 
some since you have the additional memory to use.

The server is a dual processor Athlon 1.2GHz box with hardware SCSI 
RAID. It
currently has 1 GB RAM, and we're planning to add one GB more for a 
total of
2GB. The OS is Debian Linux Kernel 2.4.x, and we're on PostgreSQL 
v7.3.2
I've got a machine running Debian Stable w/2.4.x, 1.3 ghz p3, 1.5 gig 
ram, pg 7.2.4 and it's rock solid.

eric

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Adding RAM: seeking advice & warnings of hidden "gotchas"

2003-12-17 Thread Matt Clark

If you have 3 1.5GB tables then you might as well go for 4GB while you're at
it.  Make sure you've got a bigmem kernel either running or available, and
boost effective_cache_size by whatever amount you increase the RAM by.  We
run a Quad Xeon/4GB server on Redhat 7.3 and it's solid as a rock.

There is no way I know of to get indexes preferentially cached over data
though.

Matt

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Nick
> Fankhauser
> Sent: 17 December 2003 19:57
> To: [EMAIL PROTECTED] Org
> Subject: [PERFORM] Adding RAM: seeking advice & warnings of hidden
> "gotchas"
>
>
> Hi-
>
> After having done my best to squeeze better performance out of our
> application by tuning within our existing resources, I'm falling back on
> adding memory as a short-term solution while we get creative for
> a long-term
> fix. I'm curious about what experiences others have had with the
> process of
> adding big chunks of RAM. In particular, if I'm trying to encourage the OS
> to cache more of my index information in RAM, what sort of configuration
> should I do at both the PostgreSQL and OS level?
>
> In a slightly off-topic vein, I'd also like to hear about it if
> anyone knows
> about any gotchas at the OS level that might become a problem.
>
> The server is a dual processor Athlon 1.2GHz box with hardware
> SCSI RAID. It
> currently has 1 GB RAM, and we're planning to add one GB more for
> a total of
> 2GB. The OS is Debian Linux Kernel 2.4.x, and we're on PostgreSQL v7.3.2
>
> My current memory related settings are:
>
> SHMMAX and SHMALL set to 128MB (OS setting)
> shared buffers 8192 (64MB)
> sort_mem 16384 (16MB)
> effective_cache_size  65536 (512MB)
>
>
> We support up to 70 active users, sharing a connection pool of 16
> connections. Most of the queries center around 3 tables that are about 1.5
> GB each.
>
>
> Thanks.
>-Nick
>
> -
> Nick Fankhauser
>
> [EMAIL PROTECTED]  Phone 1.765.965.7363  Fax 1.765.962.9788
> doxpop - Court records at your fingertips - http://www.doxpop.com/
>
>
>
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
>


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Why is restored database faster?

2003-12-17 Thread David Shadovitz

Dennis, Shridhar, and Neil,

Thanks for your input.  Here are my responses:

I ran VACUUM FULL on the table in question.  Although that did reduce "Pages" 
and "UnUsed", the "SELECT *" query is still much slower on this installation 
than in the new, restored one.

  Old server:
 # VACUUM FULL abc;
  VACUUM
  # VACUUM VERBOSE abc;
  NOTICE: --Relation abc--
 NOTICE: Pages 1526: Changed 0, Empty 0; Tup 91528; Vac 0, Keep 0, UnUsed 32.
 Total CPU 0.07s/0.52u sec elapsed 0.60 sec.
 VACUUM

  New server:
 # VACUUM VERBOSE abc;
  NOTICE: --Relation abc--
  NOTICE: Pages 1526: Changed 0, Empty 0; Tup 91528; Vac 0, Keep 0, UnUsed 0.
Total CPU 0.02s/0.00u sec elapsed 0.02 sec.
 VACUUM

max_fsm_pages is at its default value, 1.

People don't have the practice of dumping and restoring just for the purpose of 
improving performance, do they?

Neil asked how much disk space the database directory takes on each machine. 
 What directory is of interest?  The whole thing takes up about 875 MB on each 
machine.

-David 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Why is restored database faster?

2003-12-17 Thread Shridhar Daithankar

On Thursday 18 December 2003 09:24, David Shadovitz wrote:
>   Old server:
>  # VACUUM FULL abc;
>   VACUUM
>   # VACUUM VERBOSE abc;
>   NOTICE: --Relation abc--
>  NOTICE: Pages 1526: Changed 0, Empty 0; Tup 91528; Vac 0, Keep 0, UnUsed
> 32. Total CPU 0.07s/0.52u sec elapsed 0.60 sec.
>  VACUUM
>
>   New server:
>  # VACUUM VERBOSE abc;
>   NOTICE: --Relation abc--
>   NOTICE: Pages 1526: Changed 0, Empty 0; Tup 91528; Vac 0, Keep 0, UnUsed
> 0. Total CPU 0.02s/0.00u sec elapsed 0.02 sec.
>  VACUUM
>
> max_fsm_pages is at its default value, 1.

Well, then the only issue left is file sytem defragmentation. Which file 
system is this anyway

> People don't have the practice of dumping and restoring just for the
> purpose of improving performance, do they?

Well, at times it is required. Especially if it is update intensive 
environment. An no database is immune to that

> Neil asked how much disk space the database directory takes on each
> machine. What directory is of interest?  The whole thing takes up about 875
> MB on each machine.

That is fairly small.. Should not take much time..in my guess, the time it 
takes to vacuum is more than time to dump and reload.

Another quick way to defragment a file system is to copy entire data directory 
to another partition(Shutdown postmaster first), delete it from original 
partition and move back. Contegous wriing to a partition results in 
defragmentation effectively.

Try it and see if it helps. It could be much less trouble than dump/restore..

HTH

 Shridhar

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Nested loop question

Re: [PERFORM] Nested loop question

Re: [PERFORM] Nested loop performance

Re: [PERFORM] Nested loop performance

Re: [PERFORM] Excessive rows/tuples seriously degrading query

[PERFORM] Adding RAM: seeking advice & warnings of hidden "gotchas"

Re: [PERFORM] Adding RAM: seeking advice & warnings of hidden "gotchas"

Re: [PERFORM] Adding RAM: seeking advice & warnings of hidden "gotchas"

Re: [PERFORM] Why is restored database faster?

Re: [PERFORM] Why is restored database faster?

10 matches

Site Navigation

Mail list logo

Footer information