On Tue, Apr 19, 2016 at 1:07 AM, Aldo Sarmiento wrote:
> Hello,
>
> I'm assuming this topic has probably been bludgeoned to a pulp, but my
> google-fu can't seem to find a solution.
>
> I have two relatively largish tables that I'm trying to join that result in
> a slow query.
>
> Hardware:
>
> 20
Hello,
I'm assuming this topic has probably been bludgeoned to a pulp, but my
google-fu can't seem to find a solution.
I have two relatively largish tables that I'm trying to join that result in
a slow query.
Hardware:
2014 iMac w/ SSD & i5 processor
Tables:
contacts: 1.14 million rows
permiss
Tomas Vondra wrote:
> Yes, that's clearly the culprit here. In both cases we estimate here are
> only ~4000 tuples in the hash, and 9.3 sizes the hash table to have at
> most ~10 tuples per bucket (in a linked list).
>
> However we actually get ~3M rows, so there will be ~3000 tuples per
> bucket,
On 02/01/2016 10:38 AM, Albe Laurenz wrote:
Tomas Vondra wrote:
...
I didn't post the whole plan since it is awfully long, I'll include hyperlinks
for the whole plan.
work_mem = '100MB' (http://explain.depesz.com/s/7b6a):
-> Hash Join (cost=46738.74..285400.61 rows=292 width=8) (actual
tim
Tomas Vondra wrote:
> On 01/29/2016 04:17 PM, Albe Laurenz wrote:
>> I have a query that runs *slower* if I increase work_mem.
>>
>> The execution plans are identical in both cases, except that a temp file
>> is used when work_mem is smaller.
>> What could be an explanation for this?
>> Is this kn
Hi,
On 01/29/2016 04:17 PM, Albe Laurenz wrote:
I have a query that runs *slower* if I increase work_mem.
The execution plans are identical in both cases, except that a temp file
is used when work_mem is smaller.
The relevant lines of EXPLAIN ANALYZE output are:
With work_mem='100MB':
-> Has
Hi
> I ran operf on both backends, and they look quite similar, except that the
> number of samples is different (this is "opreport -c" output):
>
> CPU: Intel Sandy Bridge microarchitecture, speed 2899.8 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a uni
I have a query that runs *slower* if I increase work_mem.
The execution plans are identical in both cases, except that a temp file
is used when work_mem is smaller.
The relevant lines of EXPLAIN ANALYZE output are:
With work_mem='100MB':
-> Hash Join (cost=46738.74..285400.61 rows=292 width=8)
Tom Lane writes:
> I'd bet on the extra time being in I/O for the per-batch temp files, since
> it's hard
> to see what else would be different if the data were identical in each run.
> Maybe the kernel is under memory pressure and is dropping the file data from
> in-memory disk cache. Or maybe i
Dave Roberge writes:
> For example, running explain (analyze, buffers) with the query, 4/5 times we
> will see the following:
> -> Hash Join (cost=16385.76..103974.09 rows=523954 width=64) (actual
> time=532.634..4018.678 rows=258648 loops=1)
> Hash Cond: (p.a = c.c)
> Buffers: sh
Hi, I'm in the process of attempting to tune some slow queries. I came
across a scenario where I'm not entirely sure how I
might figure out why a node is taking awhile to process. I'm not concerned
with the query itself, we are working to figure
out how we can make it faster. But I was hoping s
Hello everyone,
I'm looking for a way to specify join order in SQL query. Actually, the
optimizer has chosen a plan with hash join of 2 tables, but I would like
to know if there is a way to force it to use hash join, but with replaced
tables on build phase and probe phase?
Thank you,
Mirko Spasic
On 21 December 2012 01:06, Kevin Grittner wrote:
> Huan Ruan wrote:
> > Kevin Grittner wrote:
>
> >> Frankly, at 12 microseconds per matched pair of rows, I think
> >> you're doing OK.
> >
> > This plan is the good one, I want the indexscan nested loop join and this
> > is only achieved after mak
Huan Ruan wrote:
> Kevin Grittner wrote:
>> Frankly, at 12 microseconds per matched pair of rows, I think
>> you're doing OK.
>
> This plan is the good one, I want the indexscan nested loop join and this
> is only achieved after making all these costing factors change. Before
> that, it was hash j
Frankly, at 12 microseconds per matched pair of rows, I think
> you're doing OK.
>
> This plan is the good one, I want the indexscan nested loop join and this
is only achieved after making all these costing factors change. Before
that, it was hash join and was very slow.
However, I'm worried about
"Huan Ruan" wrote:
> explain (analyze, buffers)
> SELECT
> *
> FROM IM_Match_Table smalltable
> inner join invtran bigtable on bigtable.invtranref = smalltable.invtranref
Well, one table or the other will need to be read in full, and you
would normally want that one to be the small table. When
> Quite possibly, but it could be any of a number of other things,
> like a type mismatch. It might be best to rule out other causes. If
> you post the new query and EXPLAIN ANALYZE output, along with the
> settings you have now adopted, someone may be able to spot
> something. It wouldn't hurt to
Huan Ruan wrote:
> Kevin Grittner wrote:
>> With a low cache hit rate, that would generally be when the number
>> of lookups into the table exceeds about 10% of the table's rows.
>
> So far, my main performance issue comes down to this pattern where
> Postgres chooses hash join that's slower than
>
> With a low cache hit rate, that would generally be when the number
> of lookups into the table exceeds about 10% of the table's rows.
>
>
> So far, my main performance issue comes down to this pattern where
Postgres chooses hash join that's slower than a nest loop indexed join. By
changing thos
Huan Ruan wrote:
> Interesting to see how you derived 100% cache hits. I assume by 'cache' you
> mean the pg shared buffer plus the OS cache? Because the table is 23GB but
> the shared buffer is only 6GB. Even then, I'm not completely convinced
> because the total RAM is just 24GB, part of which w
Hi Kevin
Again, many thanks for your time and help.
On 14 December 2012 02:26, Kevin Grittner wrote:
> Huan Ruan wrote:
>
> > Hash 1st run
>
> > "Hash Join (cost=1681.87..6414169.04 rows=48261 width=171)
> > (actual time=2182.450..88158.645 rows=48257 loops=1)"
>
> > " -> Seq Scan on invtran bi
Hi Kevin
On 13 December 2012 10:47, Kevin Grittner wrote:
> Huan Ruan wrote:
>
> > is a lot slower than a nested loop join.
>
> Giving actual numbers is more useful than terms like "a lot". Even
> better is to provide the output of EXPLAIN ANALYZZE rather than
> just EXPLAIN. This shows estimate
On 13 December 2012 03:28, Jeff Janes wrote:
>
> This looks like the same large-index over-penalty as discussed in the
> recent thread "[PERFORM] Slow query: bitmap scan troubles".
>
> Back-patching the log(npages) change is starting to look like a good idea.
>
> Cheers,
>
> Jeff
Thanks for the
Huan Ruan wrote:
> Hash 1st run
> "Hash Join (cost=1681.87..6414169.04 rows=48261 width=171)
> (actual time=2182.450..88158.645 rows=48257 loops=1)"
> " -> Seq Scan on invtran bigtable (cost=0.00..4730787.28
> rows=168121728 width=108) (actual time=0.051..32581.052
> rows=168121657 loops=1)"
19
Huan Ruan wrote:
> is a lot slower than a nested loop join.
Giving actual numbers is more useful than terms like "a lot". Even
better is to provide the output of EXPLAIN ANALYZZE rather than
just EXPLAIN. This shows estimates against actual numbers, and give
timings. For more suggestions see this
On Tue, Dec 11, 2012 at 8:25 PM, Huan Ruan wrote:
> Hello All
>
> While investigating switching to Postgres, we come across a query plan that
> uses hash join and is a lot slower than a nested loop join.
>
> I don't understand why the optimiser chooses the hash join in favor of the
> nested loop.
On Dec 12, 2012, at 8:57 AM, Evgeny Shishkin wrote:
>
> On Dec 12, 2012, at 8:44 AM, Huan Ruan wrote:
>
>>
>> On 12 December 2012 15:33, Evgeny Shishkin wrote:
>> Optimiser thinks that nested loop is more expensive, because of point PK
>> lookups, which a random io.
>> Can you set random_p
On Dec 12, 2012, at 8:25 AM, Huan Ruan wrote:
> Hello All
>
> While investigating switching to Postgres, we come across a query plan that
> uses hash join and is a lot slower than a nested loop join.
>
> I don't understand why the optimiser chooses the hash join in favor of the
> nested loop
Hello All
While investigating switching to Postgres, we come across a query plan that
uses hash join and is a lot slower than a nested loop join.
I don't understand why the optimiser chooses the hash join in favor of the
nested loop. What can I do to get the optimiser to make a better decision
(n
On Fri, Mar 13, 2009 at 7:08 PM, Tom Lane wrote:
> Vamsidhar Thummala writes:
> > I am wondering why are we subtracting the entire Seq Scan time of
> Lineitem
> > from the total time to calculate the HashJoin time.
>
> Well, if you're trying to identify the speed of the join itself and not
> how l
Vamsidhar Thummala writes:
> I am wondering why are we subtracting the entire Seq Scan time of Lineitem
> from the total time to calculate the HashJoin time.
Well, if you're trying to identify the speed of the join itself and not
how long it takes to provide the input for it, that seems like a
se
Thanks for such quick response.
On Fri, Mar 13, 2009 at 5:34 PM, Tom Lane wrote:
> > 2) Why is the Hash Join (top most) so slow?
>
> Doesn't look that bad to me. The net time charged to the HashJoin node
> is 186107.210 - 53597.555 - 112439.592 = 20070.063 msec. In addition it
> would be reason
Vamsidhar Thummala writes:
> 1) The actual time on Seq Scan on Lineitem shows that the first record is
> fetched at time 0.022ms and the last record is fetched at 53.5s. Does it
> mean the sequential scan is completed with-in first 53.4s (absolute time)?
No, it means that we spent a total of 53.5
>From the documentation, I understand that range of actual time represents
the time taken for retrieving the first result and the last result
respectively. However, the following output of explain analyze confuses me:
GroupAggregate (cost=632185.58..632525.55 rows=122884 width=57)
(actual time=18
Tomas,
OK, what was the number of unused pointer items in the VACUUM output?
I posted it in this thread:
VACUUM FULL ANALYZE VERBOSE;
...
INFO: free space map contains 14353 pages in 314 relations
DETAIL: A total of 2 page slots are in use (including overhead).
89664 page slots are requ
>> Given the fact that the performance issues are caused by bloated tables
>> and / or slow I/O subsystem, moving to a similar system won't help I
>> guess.
>
> I have ran VACUUM FULL ANALYZE VERBOSE
> and set MAX_FSM_PAGES = 15
>
> So there is no any bloat except pg_shdepend indexes which shou
Tomas,
Let's suppose you set a reasonable value (say 8096) instead of 2GB. That
gives about 160MB.
Anyway this depends - if you have a lot of slow queries caused by on-disk
sorts / hash tables, use a higher value. Otherwise leave it as it is.
Probably product orders table is frequently joined
Scott,
thank you.
> work_mem = 512
This is very easy to try. You can change work_mem for just a single
session, and this can in some cases help performance quite a bit, and in
others not at all.
I would not recommend having it lower than at least 4MB on a server like
that unless you have a lo
My test computer has PostgreSql 8.3, 4 GB RAM, SSD disks, Intel X2Extreme
CPU
So it is much faster than this prod server.
No idea how to emulate this environment.
I can create new db in prod server as old copy but this can be used in late
night only.
Well, a faster but comparable system may not
Scott,
thank you.
> work_mem = 512
This is very easy to try. You can change work_mem for just a single
session, and this can in some cases help performance quite a bit, and in
others not at all.
I would not recommend having it lower than at least 4MB on a server like
that unless you have a lo
I guess you have backups - take them, restore the database on a different
machine (preferably with the same / similar hw config) and tune the
queries on it.
After restoring all the tables / indexes will be 'clean' (not bloated), so
you'll see if performing VACUUM FULL / CLUSTER is the right solut
> Appoaches which probably does not change perfomance:
> 6. Upgrade to 8.4 or to 8.3.5
Both of these will improve performance a little, even with the same query plan
and same data. I would expect about a 10% improvement for 8.3.x on most memory
bound select queries. 8.4 won't be out for a few
Risky to try in prod server. Requires creating randomly distributed
product_id testcase to measure
difference.
What should I do next?
I guess you have backups - take them, restore the database on a
different machine (preferably with the same / similar hw config) and
tune the queries on it.
You could try writing a plpgsql function which would generate the data
set.
Or you could use your existing data set.
Creating 3.5 mln rows using stored proc is probably slow.
Probably it would be better and faster to use some random() and
generate_series() trick.
In this case others can try it a
Thank you very much for great sample.
I tried to create testcase from this to match production db:
1.2 million orders
3.5 million order details
13400 products with char(20) as primary keys containing ean-13 codes
mostly
3 last year data
every order has usually 1..3 detail lines
same product
You could perhaps run a little check on the performance of the RAID, is
it better than linux software RAID ?
Does it leverage NCQ appropriately when running queries in parallel ?
I was told that this RAID is software RAID.
I have no experience what to check.
This HP server was installed 3 years
On Fri, 21 Nov 2008 21:07:02 +0100, Tom Lane <[EMAIL PROTECTED]> wrote:
PFC <[EMAIL PROTECTED]> writes:
Index on orders_products( product_id ) and orders_products( order_id ):
=> Same plan
Note that in this case, a smarter planner would use the new index to
perform a BitmapAn
@postgresql.org
Subject: Re: [PERFORM] Hash join on int takes 8..114 seconds
> If it's not a million rows, then the table is bloated. Try (as postgres
> or some other db superuser) "vacuum full pg_shdepend" and a "reindex
> pg_shdepend".
reindex table pg_shdepend
If it's not a million rows, then the table is bloated. Try (as postgres
or some other db superuser) "vacuum full pg_shdepend" and a "reindex
pg_shdepend".
reindex table pg_shdepend causes error
ERROR: shared table "pg_shdepend" can only be reindexed in stand-alone mode
vacuum full verbose pg_
2. Run the following commands periodically in this order:
VACUUM FULL;
vacuum full pg_shdepend;
CLUSTER rid on (toode);
CLUSTER dok on (kuupaev);
REINDEX DATABASE mydb;
REINDEX SYSTEM mydb;
ANALYZE;
Are all those command required or can something leaved out ?
Running CLUSTER after VACUUM FULL
Thank you.
My 8.1.4 postgresql.conf does not contain such option. So
vacuum_cost_delay is off probably.
Since doc does not recommend any value, I planned to use 2000
Will value of 30 allow other clients to work when VACUUM FULL is running ?
No, as someone already noted the VACUUM FULL is bloc
Andrus wrote:
>> So I gather you're not doing any vacuuming, eh?
>
> Log files for every day are full of garbage messages below.
> So I hope that vacuum is running well, isn't it ?
This does not really mean that autovacuum has done anything in the
databases. If the times are consistently separat
Alvaro,
1. vacuum_cost_delay does not affect vacuum full
2. vacuum full is always blocking, regardless of settings
So only way is to disable other database acces if vacuum full is required.
So I gather you're not doing any vacuuming, eh?
Log files for every day are full of garbage messages
PFC <[EMAIL PROTECTED]> writes:
> Index on orders_products( product_id ) and orders_products( order_id ):
> => Same plan
> Note that in this case, a smarter planner would use the new index to
> perform a BitmapAnd before hitting the heap to get the rows.
Considering that the query h
Andrus wrote:
> Will value of 30 allow other clients to work when VACUUM FULL is running ?
1. vacuum_cost_delay does not affect vacuum full
2. vacuum full is always blocking, regardless of settings
So I gather you're not doing any vacuuming, eh?
--
Alvaro Herrera
Alvaro,
Are you really using vacuum_cost_delay=2000? If so, therein lies your
problem. That's a silly value to use for that variable. Useful values
are in the 20-40 range probably, or maybe 10-100 being extremely
generous.
Thank you.
My 8.1.4 postgresql.conf does not contain such option. So
Andrus wrote:
> I discovered vacuum_cost_delay=2000 option. Will this remove blocking
> issue and allow vacuum full to work ?
No.
Are you really using vacuum_cost_delay=2000? If so, therein lies your
problem. That's a silly value to use for that variable. Useful values
are in the 20-40 range
On Friday 21 November 2008, "Andrus" <[EMAIL PROTECTED]> wrote:
> Those commands cause server probably to stop responding to other client
> like vacuum full pg_shdepend
> did.
>
> Should vacuum_cost_delay = 2000 allow other users to work when running
> those commands ?
Any vacuum full or cluster w
How to vacuum full pg_shdepend automatically so that other users can
work at same time ?
Your table is horribly bloated.
You must use VACUUM FULL + REINDEX (as superuser) on it, however
unfortunately, it is blocking.
Therefore, you should wait for sunday night to do this, when noone will
notic
log file seems that mostly only those queries are slow:
SELECT ...
FROM dok JOIN rid USING (dokumnr)
JOIN ProductId USING (ProductId)
WHERE rid.ProductId LIKE :p1 || '%' AND dok.SaleDate>=:p2
:p1 and :p2 are parameters different for different queries.
dok contains several years of d
Thomas,
Thank you.
Just the most important points:
1) "dok" table contains 1235086 row versions in 171641 pages (with 8kB
pages this means 1.4GB MB of data), but there are 1834279 unused item
pointers (i.e. about 60% of the space is wasted)
2) "rid" table contains 3275189 roiws in 165282 (wit
Server has 2 GB RAM.
It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ
mirrored disks.
You could perhaps run a little check on the performance of the RAID, is
it better than linux software RAID ?
Does it leverage NCQ appropriately when running queries in para
How to vacuum full pg_shdepend automatically so that other users can
work at same time ?
Your table is horribly bloated.
You must use VACUUM FULL + REINDEX (as superuser) on it, however
unfortunately, it is blocking.
Therefore, you should wait for sunday night to do this, when noo
Richard,
Thank you.
Try "SELECT count(*) FROM pg_shdepend".
This query returns 3625 and takes 35 seconds to run.
If it's not a million rows, then the table is bloated. Try (as postgres
or some other db superuser) "vacuum full pg_shdepend" and a "reindex
pg_shdepend".
vacuum full verbose
Andrus wrote:
>> - what's the size of the dataset relative to the RAM ?
>
> Db size is 7417 MB
> relevant table sizes in desc by size order:
>
> 140595 dok 2345 MB
> 2 1214 pg_shdepend 2259 MB
> 6 123
Just the most important points:
1) "dok" table contains 1235086 row versions in 171641 pages (with 8kB
pages this means 1.4GB MB of data), but there are 1834279 unused item
pointers (i.e. about 60% of the space is wasted)
2) "rid" table contains 3275189 roiws in 165282 (with 8kB pages this means
PFC,
thank you.
OK so vmstat says you are IO-bound, this seems logical if the same plan
has widely varying timings...
Let's look at the usual suspects :
- how many dead rows in your tables ? are your tables data, or bloat ?
(check vacuum verbose, etc)
set search_path to firma2,public;
vacuu
Richard,
In addition to "top" below, you'll probably find "vmstat 5" useful.
Thank you.
During this query run (65 sec), vmstat 5 shows big values in bi,cs and wa
columns:
procs ---memory-- ---swap-- -io --system-- cpu
r b swpd free buff cache si
OK so vmstat says you are IO-bound, this seems logical if the same plan
has widely varying timings...
Let's look at the usual suspects :
- how many dead rows in your tables ? are your tables data, or bloat ?
(check vacuum verbose, etc)
- what's the size of the dataset re
Andrus wrote:
> Richard,
>
>> At a quick glance, the plans look the same to me. The overall costs are
>> certainly identical. That means whatever is affecting the query times it
>> isn't the query plan.
>>
>> So - what other activity is happening on this machine? Either other
>> queries are taking
Richard,
At a quick glance, the plans look the same to me. The overall costs are
certainly identical. That means whatever is affecting the query times it
isn't the query plan.
So - what other activity is happening on this machine? Either other
queries are taking up noticeable resources, or some
Andrus wrote:
> Query below seems to use indexes everywhere in most optimal way.
> dokumnr column is of type int
>
> Speed of this query varies rapidly:
>
> In live db fastest response I have got is 8 seconds.
> Re-running same query after 10 seconds may take 60 seconds.
> Re-running it again af
Just a question, what are you doing with the 20.000 result rows ?
Those rows represent monthly sales data of one item.
They are used as following:
1. Detailed sales report for month. This report can browsed in screen for
montly sales and ordering analysis.
2. Total reports. In those reports,
Query below seems to use indexes everywhere in most optimal way.
dokumnr column is of type int
Speed of this query varies rapidly:
In live db fastest response I have got is 8 seconds.
Re-running same query after 10 seconds may take 60 seconds.
Re-running it again after 10 seconds may take 114
Query below seems to use indexes everywhere in most optimal way.
dokumnr column is of type int
Speed of this query varies rapidly:
In live db fastest response I have got is 8 seconds.
Re-running same query after 10 seconds may take 60 seconds.
Re-running it again after 10 seconds may take 114 s
75 matches
Mail list logo