Re: [PERFORM] An "obvious" index not being used

2008-07-20 Thread Daniele Varrazzo

Tom Lane ha scritto:

Daniele Varrazzo <[EMAIL PROTECTED]> writes:
In my problem I had 2 tables: a small one (accounts), a large one (foo). The 
way the query is written doesn't allow the stats from the large table to be 
used at all, unless the records from the small table are fetched. This is 
independent from the stats accuracy.


What the planner does is to assume an even distribution in the data in the 
joined fields.


Sir, you don't know what you're talking about.


This is probably correct, I am not into the PG internals.

I was just reporting the analysis I proposed in my previous message in this 
thread 
(http://archives.postgresql.org/pgsql-performance/2008-06/msg00095.php). You 
gave me an hint of where the backend was missing to correctly estimate, and I 
deduced a guess of the strategy the backend could have used to reach that 
result - not matching the reality of my data set but I think matching the 
picture it could have using the stats data but not performing any further fetch.


Nobody confuted that message, of course that may have happened because it was 
laughable:


Daniele Varrazzo ha scritto:
> Tom Lane ha scritto:
>> Daniele Varrazzo <[EMAIL PROTECTED]> writes:
>>> There is an index in the field "foo.account_id" but is not used. The
>>> resulting query plan is:
>>
>>>   Aggregate  (cost=300940.70..300940.71 rows=1 width=0) (actual
>>> time=13412.088..13412.089 rows=1 loops=1)
>>> ->  Hash IN Join  (cost=11.97..299858.32 rows=432953 width=0)
>>> (actual
>>> time=0.678..13307.074 rows=92790 loops=1)
>>>   Hash Cond: (foo.account_id = accounts.id)
>>>   ->  Seq Scan on foo  (cost=0.00..275591.14 rows=5313514
>>> width=4)
>>> (actual time=0.014..7163.538 rows=5313514 loops=1)
>>
>> Well, if the estimate of 432953 rows selected were correct, it'd be
>> right not to use the index.  Fetching one row in ten is not a chore
>> for an indexscan.  (I'm not sure it'd prefer an indexscan even with an
>> accurate 92K-row estimate, but at least you'd be in the realm where
>> tweaking random_page_cost would make a difference.)
>
> Let me guess: because the account tables has an estimated (and correct)
> guess of 22 records fetched out from 270 =~ 8%, it assumes that it will
> need to fetch the 8% of 5.3M records (which... yes, it matches the
> estimate of 433K).

This is the idea I had about how the query planner behaved in that query, and 
why the query performs as I expect when the joined items are explicit. Was it 
wrong?


Thank you very much. Again, the only reason for which I think I was right is 
because nobody confuted my previous email.


Regards,

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 3ware vs Areca

2008-07-20 Thread Greg Smith

On Tue, 15 Jul 2008, Jeffrey Baker wrote:


Debian "etch", which has a 2.6.18 kernel.  I have contacted Areca
support (as well as the linux-scsi mailing list) and their responses
are usually either 1) upgrade the driver and/or firmware even though I
have the latest drivers and firmware


Well, technically you don't have the latest driver, because that's the one 
that comes with the latest Linux kernel.  I'm guessing you have RHEL5 here 
from that fact that you're using 2.6.18.  I have a CentOS5 system here 
with an Areca card in it.  It installed it initially with the stock 2.6.18 
kernel there but it never worked quite right; all sorts of odd panics 
under heavy load.  All my problems went away just by moving to a generic 
2.6.22, released some time after the Areca card became of more first-class 
citizen maintained actively by the kernel developers themselves.


2) vague statements about the disk being incompatible with the 
controller.


That sort of situation is unfortunate but I don't feel it's unique to 
Areca.  There's lots of reasons why some manufacturers end up with drives 
that don't work well with some controllers, and it is hard to assign blame 
when it happens.  There is something to be said for buying more integrated 
and tested systems; ultimately if you build stuff from parts, you're kind 
of stuck being the QA and that process presumes that you may discover 
incompatible combinations and punt them out in place of ones that do.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)

2008-07-20 Thread Greg Smith

On Sat, 19 Jul 2008, Stephane Bailliez wrote:


OS is Ubuntu 7.10 x86_64 running  2.6.22-14


Note that I've had some issues with the desktop Ubuntu giving slower 
results in tests like this than the same kernel release using the stock 
kernel parameters.  Haven't had a chance yet to see how the server Ubuntu 
kernel fits into that or exactly what the desktop one is doing wrong yet. 
Could be worse--if you were running any 8.04 I expect your pgbench results 
would be downright awful.



data is on xfs noatime


While XFS has some interesting characteristics, make sure you're 
comfortable with the potential issues the journal approach used by that 
filesystem has.  With ext3, you can choose the somewhat risky writeback 
behavior or not, you're stuck with it in XFS as far as I know.  A somewhat 
one-sided intro here is at 
http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc



postgresql 8.2.9 with data and xlog as mentioned above


There are so many known performance issues in 8.2 that are improved in 8.3 
that I'd suggest you really should be considering it for a new install at 
this point.


Script running over scaling factor 1 to 1000 and running 3 times pgbench with 
"pgbench -t 2000 -c 8 -S pgbench"


In general, you'll want to use a couple of clients per CPU core for 
pgbench tests to get a true look at the scalability.  Unfortunately, the 
way the pgbench client runs means that it tends to top out at 20 or 30 
thousand TPS on read-only tests no matter how many cores you have around. 
But you may find operations where peak throughput comes at closer to 32 
clients here rather than just 8.


It's a bit limited and will try to do a much much longer run and increase the 
# of tests and calculate mean and stddev as I have a pretty large variation 
for the 3 runs sometimes (typically for the scaling factor at 1000, the runs 
are respectively 1952, 940, 3162)  so the graph is pretty ugly.


This is kind of a futile exercise and I wouldn't go crazy trying to 
analyze here.  Having been through that many times, I predict you'll 
discover no real value to a more statistically intense analysis.  It's not 
like sampling at more points makes the variation go away, or that the 
variation itself has some meaning worth analyzing.  Really the goal of 
pgbench tests should be look at a general trend.  Looking at your data for 
example, I'd say the main useful observation to draw from your tests is 
that performance is steady then drops off sharply once the database itself 
exceeds 10GB, which is a fairly positive statement that you're getting 
something out of most of the the 16GB of RAM in the server during this 
test.


As far as the rest of your results go, Luke's comment that you may need 
more than one process to truly see the upper limit of your disk 
performance is right on target.  More useful commentary on that issue I'd 
recomend is near the end of 
http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/


(man does that need to be a smaller URL)

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] log_statement at postgres.conf

2008-07-20 Thread System/IJS - Joko

Thx a lot Nicolas,

I finaly success to log query statement because of your simple explanation.
I have other question:
1. Is there posibility to automatically logging that statement to table?
2. All of that statement is come from every database on my server,
 could I know from which database that statement come?
 or at least I can filter to log only from database X ?
3. If I need to log only changed made on my database, then the value of 
'log_statement' is 'mod' ?

CMIIW

Regards,
Joko [SYSTEM]
PT. Indra Jaya Swastika
Phone: +62 31 7481388  Ext 201
http://www.ijs.co.id

--sorry for my bad english

- Original Message - 
From: "Pomarede Nicolas" <[EMAIL PROTECTED]>

To: "System/IJS - Joko" <[EMAIL PROTECTED]>
Cc: 
Sent: Friday, July 18, 2008 3:16 PM
Subject: Re: [PERFORM] log_statement at postgres.conf


There're 2 points in your question :

 - what to log
 - where to log

To choose 'what' to log in your case, you can change 'log_statement' to 
'all'.


Then, to choose 'where' to log, you can either use the proposal in the 
first answer, or change 'log_destination' to 'stderr' and 
'redirect_stderr' to 'on'.


Nicolas



--
If you have any problem with our services ,
please contact us at 70468146 or e-mail: [EMAIL PROTECTED]
PT Indra Jaya Swastika | Jl. Kalianak Barat 57A | +62-31-7481388



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance