Re: [PERFORM] Shared memory for large PostGIS operations

2012-03-17 Thread Andy Colson

On 03/16/2012 05:30 PM, Kevin Grittner wrote:

Brian Hamlin  wrote:

On Mar 16, 2012, at 7:17 AM, Kevin Grittner wrote:

Andy Colson  wrote:


I tried shared_buffers at both 2400M and 18000M, and it took 4.5
hours both times.  ... (weak attempts at humor omitted) 


Ah, I didn't pick up on the attempts at humor; perhaps that's why
you mistook something I said as an attempt at an insult.


It wasn't you Kevin, it was me that insulted him.  (Although I was trying to be 
funny, and not mean).

Sorry again Brian.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Shared memory for large PostGIS operations

2012-03-17 Thread Kevin Grittner
Andy Colson  wrote:
On 03/16/2012 05:30 PM, Kevin Grittner wrote:
 
>> Ah, I didn't pick up on the attempts at humor; perhaps that's why
>> you mistook something I said as an attempt at an insult.
>
> It wasn't you Kevin, it was me that insulted him. (Although I was
> trying to be funny, and not mean).
 
Adding to the confusion, I think I missed one of the emails/posts. 
Oh, well, it sounds like mostly people need to use more smiley-faces,
since humor can be so easy to miss in this medium.
 
Brian, I hope this doesn't put you off from posting -- we try to be
helpful here.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] slow self-join query

2012-03-17 Thread Robert Poor
Disclaimer: this is a re-post, since I wasn't subscribed the first
time I posted.  Pardon if this is a duplicate.]

The following query is abysmally slow (e.g. 7 hours+).  The goal is to
find "among the users that follow user #1, who do they also follow?"
and to count the latter.

   SELECT L2.followed_id as followed_id, COUNT(U1.id) AS count
 FROM users AS U1
INNER JOIN links AS L1 ON L1.follower_id = U1.id
INNER JOIN links AS L2 ON L2.follower_id = U1.id
WHERE U1.type = 'User::Twitter'
  AND L1.followed_id = 1
 GROUP BY L2.followed_id

Here's the rest of the info.

=== versions

psql (9.1.2, server 8.3.14)

=== schema

 create_table "users", :force => true do |t|
   t.string   "type"
 end

 create_table "links", :force => true do |t|
   t.integer  "followed_id"
   t.integer  "follower_id"
 end

 add_index "links", ["follower_id"], :name => "index_links_on_follower_id"
 add_index "links", ["followed_id", "follower_id"], :name =>
"index_links_on_followed_id_and_follower_id", :unique => true
 add_index "links", ["followed_id"], :name => "index_links_on_followed_id"

=== # of rows

users: 2,525,539
links: 4,559,390

=== explain

 "HashAggregate  (cost=490089.52..490238.78 rows=11941 width=8)"
 "  ->  Hash Join  (cost=392604.44..483190.22 rows=1379860 width=8)"
 "Hash Cond: (f1.follower_id = u1.id)"
 "->  Bitmap Heap Scan on links f1  (cost=14589.95..55597.70
rows=764540 width=4)"
 "  Recheck Cond: (followed_id = 1)"
 "  ->  Bitmap Index Scan on index_links_on_followed_id
(cost=0.00..14398.82 rows=764540 width=0)"
 "Index Cond: (followed_id = 1)"
 "->  Hash  (cost=300976.98..300976.98 rows=4559881 width=12)"
 "  ->  Hash Join  (cost=94167.40..300976.98 rows=4559881 width=12)"
 "Hash Cond: (f2.follower_id = u1.id)"
 "->  Seq Scan on links f2  (cost=0.00..77049.81
rows=4559881 width=8)"
 "->  Hash  (cost=53950.20..53950.20 rows=2526496 width=4)"
 "  ->  Seq Scan on users u1
(cost=0.00..53950.20 rows=2526496 width=4)"
 "Filter: ((type)::text =
'User::Twitter'::text)"

=== other comments

I'm assuming I'm doing something obviously stupid and that the above
info will be sufficient for anyone skilled in the art to detect the
problem.  However, if needed I will gladly invest the time
to create a subset of the data in order to run EXPLAIN ANALYZE.  (With
the whole dataset, it requires > 7 hours to complete the query.  I
don't want to go down that path again!)

- rdp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow self-join query

2012-03-17 Thread Scott Marlowe
On Sat, Mar 17, 2012 at 2:56 PM, Robert Poor  wrote:
> Disclaimer: this is a re-post, since I wasn't subscribed the first
> time I posted.  Pardon if this is a duplicate.]
>
> The following query is abysmally slow (e.g. 7 hours+).  The goal is to
> find "among the users that follow user #1, who do they also follow?"
> and to count the latter.
>
>   SELECT L2.followed_id as followed_id, COUNT(U1.id) AS count
>     FROM users AS U1
> INNER JOIN links AS L1 ON L1.follower_id = U1.id
> INNER JOIN links AS L2 ON L2.follower_id = U1.id
>    WHERE U1.type = 'User::Twitter'
>      AND L1.followed_id = 1
>  GROUP BY L2.followed_id
>
> Here's the rest of the info.
>
> === versions
>
> psql (9.1.2, server 8.3.14)
>
> === schema
>
>  create_table "users", :force => true do |t|
>   t.string   "type"
>  end
>
>  create_table "links", :force => true do |t|
>   t.integer  "followed_id"
>   t.integer  "follower_id"
>  end
>
>  add_index "links", ["follower_id"], :name => "index_links_on_follower_id"
>  add_index "links", ["followed_id", "follower_id"], :name =>
> "index_links_on_followed_id_and_follower_id", :unique => true
>  add_index "links", ["followed_id"], :name => "index_links_on_followed_id"
>
> === # of rows
>
> users: 2,525,539
> links: 4,559,390
>
> === explain
>
>  "HashAggregate  (cost=490089.52..490238.78 rows=11941 width=8)"
>  "  ->  Hash Join  (cost=392604.44..483190.22 rows=1379860 width=8)"
>  "        Hash Cond: (f1.follower_id = u1.id)"
>  "        ->  Bitmap Heap Scan on links f1  (cost=14589.95..55597.70
> rows=764540 width=4)"
>  "              Recheck Cond: (followed_id = 1)"
>  "              ->  Bitmap Index Scan on index_links_on_followed_id
> (cost=0.00..14398.82 rows=764540 width=0)"
>  "                    Index Cond: (followed_id = 1)"
>  "        ->  Hash  (cost=300976.98..300976.98 rows=4559881 width=12)"
>  "              ->  Hash Join  (cost=94167.40..300976.98 rows=4559881 
> width=12)"
>  "                    Hash Cond: (f2.follower_id = u1.id)"
>  "                    ->  Seq Scan on links f2  (cost=0.00..77049.81
> rows=4559881 width=8)"
>  "                    ->  Hash  (cost=53950.20..53950.20 rows=2526496 
> width=4)"
>  "                          ->  Seq Scan on users u1
> (cost=0.00..53950.20 rows=2526496 width=4)"
>  "                                Filter: ((type)::text =
> 'User::Twitter'::text)"
>
> === other comments
>
> I'm assuming I'm doing something obviously stupid and that the above
> info will be sufficient for anyone skilled in the art to detect the
> problem.  However, if needed I will gladly invest the time
> to create a subset of the data in order to run EXPLAIN ANALYZE.  (With
> the whole dataset, it requires > 7 hours to complete the query.  I
> don't want to go down that path again!)

Do you have an index on users.type?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance