Re: [PERFORM] An "obvious" index not being used
Tom Lane ha scritto: Daniele Varrazzo <[EMAIL PROTECTED]> writes: In my problem I had 2 tables: a small one (accounts), a large one (foo). The way the query is written doesn't allow the stats from the large table to be used at all, unless the records from the small table are fetched. This is independent from the stats accuracy. What the planner does is to assume an even distribution in the data in the joined fields. Sir, you don't know what you're talking about. This is probably correct, I am not into the PG internals. I was just reporting the analysis I proposed in my previous message in this thread (http://archives.postgresql.org/pgsql-performance/2008-06/msg00095.php). You gave me an hint of where the backend was missing to correctly estimate, and I deduced a guess of the strategy the backend could have used to reach that result - not matching the reality of my data set but I think matching the picture it could have using the stats data but not performing any further fetch. Nobody confuted that message, of course that may have happened because it was laughable: Daniele Varrazzo ha scritto: > Tom Lane ha scritto: >> Daniele Varrazzo <[EMAIL PROTECTED]> writes: >>> There is an index in the field "foo.account_id" but is not used. The >>> resulting query plan is: >> >>> Aggregate (cost=300940.70..300940.71 rows=1 width=0) (actual >>> time=13412.088..13412.089 rows=1 loops=1) >>> -> Hash IN Join (cost=11.97..299858.32 rows=432953 width=0) >>> (actual >>> time=0.678..13307.074 rows=92790 loops=1) >>> Hash Cond: (foo.account_id = accounts.id) >>> -> Seq Scan on foo (cost=0.00..275591.14 rows=5313514 >>> width=4) >>> (actual time=0.014..7163.538 rows=5313514 loops=1) >> >> Well, if the estimate of 432953 rows selected were correct, it'd be >> right not to use the index. Fetching one row in ten is not a chore >> for an indexscan. (I'm not sure it'd prefer an indexscan even with an >> accurate 92K-row estimate, but at least you'd be in the realm where >> tweaking random_page_cost would make a difference.) > > Let me guess: because the account tables has an estimated (and correct) > guess of 22 records fetched out from 270 =~ 8%, it assumes that it will > need to fetch the 8% of 5.3M records (which... yes, it matches the > estimate of 433K). This is the idea I had about how the query planner behaved in that query, and why the query performs as I expect when the joined items are explicit. Was it wrong? Thank you very much. Again, the only reason for which I think I was right is because nobody confuted my previous email. Regards, -- Daniele Varrazzo - Develer S.r.l. http://www.develer.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 3ware vs Areca
On Tue, 15 Jul 2008, Jeffrey Baker wrote: Debian "etch", which has a 2.6.18 kernel. I have contacted Areca support (as well as the linux-scsi mailing list) and their responses are usually either 1) upgrade the driver and/or firmware even though I have the latest drivers and firmware Well, technically you don't have the latest driver, because that's the one that comes with the latest Linux kernel. I'm guessing you have RHEL5 here from that fact that you're using 2.6.18. I have a CentOS5 system here with an Areca card in it. It installed it initially with the stock 2.6.18 kernel there but it never worked quite right; all sorts of odd panics under heavy load. All my problems went away just by moving to a generic 2.6.22, released some time after the Areca card became of more first-class citizen maintained actively by the kernel developers themselves. 2) vague statements about the disk being incompatible with the controller. That sort of situation is unfortunate but I don't feel it's unique to Areca. There's lots of reasons why some manufacturers end up with drives that don't work well with some controllers, and it is hard to assign blame when it happens. There is something to be said for buying more integrated and tested systems; ultimately if you build stuff from parts, you're kind of stuck being the QA and that process presumes that you may discover incompatible combinations and punt them out in place of ones that do. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)
On Sat, 19 Jul 2008, Stephane Bailliez wrote: OS is Ubuntu 7.10 x86_64 running 2.6.22-14 Note that I've had some issues with the desktop Ubuntu giving slower results in tests like this than the same kernel release using the stock kernel parameters. Haven't had a chance yet to see how the server Ubuntu kernel fits into that or exactly what the desktop one is doing wrong yet. Could be worse--if you were running any 8.04 I expect your pgbench results would be downright awful. data is on xfs noatime While XFS has some interesting characteristics, make sure you're comfortable with the potential issues the journal approach used by that filesystem has. With ext3, you can choose the somewhat risky writeback behavior or not, you're stuck with it in XFS as far as I know. A somewhat one-sided intro here is at http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc postgresql 8.2.9 with data and xlog as mentioned above There are so many known performance issues in 8.2 that are improved in 8.3 that I'd suggest you really should be considering it for a new install at this point. Script running over scaling factor 1 to 1000 and running 3 times pgbench with "pgbench -t 2000 -c 8 -S pgbench" In general, you'll want to use a couple of clients per CPU core for pgbench tests to get a true look at the scalability. Unfortunately, the way the pgbench client runs means that it tends to top out at 20 or 30 thousand TPS on read-only tests no matter how many cores you have around. But you may find operations where peak throughput comes at closer to 32 clients here rather than just 8. It's a bit limited and will try to do a much much longer run and increase the # of tests and calculate mean and stddev as I have a pretty large variation for the 3 runs sometimes (typically for the scaling factor at 1000, the runs are respectively 1952, 940, 3162) so the graph is pretty ugly. This is kind of a futile exercise and I wouldn't go crazy trying to analyze here. Having been through that many times, I predict you'll discover no real value to a more statistically intense analysis. It's not like sampling at more points makes the variation go away, or that the variation itself has some meaning worth analyzing. Really the goal of pgbench tests should be look at a general trend. Looking at your data for example, I'd say the main useful observation to draw from your tests is that performance is steady then drops off sharply once the database itself exceeds 10GB, which is a fairly positive statement that you're getting something out of most of the the 16GB of RAM in the server during this test. As far as the rest of your results go, Luke's comment that you may need more than one process to truly see the upper limit of your disk performance is right on target. More useful commentary on that issue I'd recomend is near the end of http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ (man does that need to be a smaller URL) -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] log_statement at postgres.conf
Thx a lot Nicolas, I finaly success to log query statement because of your simple explanation. I have other question: 1. Is there posibility to automatically logging that statement to table? 2. All of that statement is come from every database on my server, could I know from which database that statement come? or at least I can filter to log only from database X ? 3. If I need to log only changed made on my database, then the value of 'log_statement' is 'mod' ? CMIIW Regards, Joko [SYSTEM] PT. Indra Jaya Swastika Phone: +62 31 7481388 Ext 201 http://www.ijs.co.id --sorry for my bad english - Original Message - From: "Pomarede Nicolas" <[EMAIL PROTECTED]> To: "System/IJS - Joko" <[EMAIL PROTECTED]> Cc: Sent: Friday, July 18, 2008 3:16 PM Subject: Re: [PERFORM] log_statement at postgres.conf There're 2 points in your question : - what to log - where to log To choose 'what' to log in your case, you can change 'log_statement' to 'all'. Then, to choose 'where' to log, you can either use the proposal in the first answer, or change 'log_destination' to 'stderr' and 'redirect_stderr' to 'on'. Nicolas -- If you have any problem with our services , please contact us at 70468146 or e-mail: [EMAIL PROTECTED] PT Indra Jaya Swastika | Jl. Kalianak Barat 57A | +62-31-7481388 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance