[PERFORM] Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m

2012-08-07 Thread Stefan Keller
Hi

I have an interesting query to be optimized related to this one [1].

The query definition is: Select all buildings that have more than 1
pharmacies and more than 1 schools within a radius of 1000m.

The problem is that I think that this query is inherently O(n^2). In
fact the solution I propose below takes forever...

My questions:

1. Any comments about the nature of this problem?

2. ... on how to speed it up ?

3. In the original query [1] there's a count which contains a
subquery. According to my tests PostgreSQL does not allow this despite
the documentation which says "count(expression)".

Remarks: I know that "count(*)" could be faster on PostgreSQL but
"count(osm_id)" does not change the query plan and this does not seem
to be the bottleneck here anyway.

Yours, S.

[1] 
http://gis.stackexchange.com/questions/11445/selecting-pois-around-specific-buildings-using-postgis


Here's my query:

-- Select all buildings that have >1 pharmacies and >1 schools within 1000m:
SELECT osm_id AS building_id
FROM
  (SELECT osm_id, way
   FROM osm_polygon
   WHERE tags @> hstore('building','yes')
  ) AS b
WHERE
 (SELECT count(*) > 1
  FROM osm_poi AS p
  WHERE p.tags @> hstore('amenity','pharmacy')
  AND ST_DWithin(b.way,p.way,1000)
 )
 AND
 (SELECT count(*) > 1
  FROM osm_poi AS p
  WHERE p.tags @> hstore('amenity','school')
  AND ST_DWithin(b.way,p.way,1000)
 )
-- Total query runtime: 4308488 ms. 66345 rows retrieved.

Here's the query plan (from EXPLAIN):
"Index Scan using osm_polygon_tags_idx on osm_polygon
(cost=0.00..406812.81 rows=188 width=901)"
"  Index Cond: (tags @> '"building"=>"yes"'::hstore)"
"  Filter: ((SubPlan 1) AND (SubPlan 2))"
"  SubPlan 1"
"->  Aggregate  (cost=269.19..269.20 rows=1 width=0)"
"  ->  Bitmap Heap Scan on osm_poi p  (cost=7.76..269.19
rows=1 width=0)"
"Recheck Cond: (way && st_expand(osm_polygon.way,
1000::double precision))"
"Filter: ((tags @> '"amenity"=>"pharmacy"'::hstore)
AND (osm_polygon.way && st_expand(way, 1000::double precision)) AND
_st_dwithin(osm_polygon.way, way, 1000::double precision))"
"->  Bitmap Index Scan on osm_poi_way_idx
(cost=0.00..7.76 rows=62 width=0)"
"  Index Cond: (way && st_expand(osm_polygon.way,
1000::double precision))"
"  SubPlan 2"
"->  Aggregate  (cost=269.19..269.20 rows=1 width=0)"
"  ->  Bitmap Heap Scan on osm_poi p  (cost=7.76..269.19
rows=1 width=0)"
"Recheck Cond: (way && st_expand(osm_polygon.way,
1000::double precision))"
"Filter: ((tags @> '"amenity"=>"school"'::hstore) AND
(osm_polygon.way && st_expand(way, 1000::double precision)) AND
_st_dwithin(osm_polygon.way, way, 1000::double precision))"
"->  Bitmap Index Scan on osm_poi_way_idx
(cost=0.00..7.76 rows=62 width=0)"
"  Index Cond: (way && st_expand(osm_polygon.way,
1000::double precision))"

***

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m

2012-08-07 Thread Tomas Vondra
On 7 Srpen 2012, 14:01, Stefan Keller wrote:
> Hi
>
> I have an interesting query to be optimized related to this one [1].
>
> The query definition is: Select all buildings that have more than 1
> pharmacies and more than 1 schools within a radius of 1000m.
>
> The problem is that I think that this query is inherently O(n^2). In
> fact the solution I propose below takes forever...

What about plain INTERSECT? Something like

SELECT osm_id FROM osm_poi AS p, osm_polygon b
   WHERE p.tags @> hstore('amenity','pharmacy')
   AND ST_DWithin(b.way,p.way,1000)
INTERSECT
SELECT osm_id FROM osm_poi AS p, osm_polygon b
   WHERE p.tags @> hstore('amenity','school')
   AND ST_DWithin(b.way,p.way,1000)

Or something like that. But maybe it's a complete nonsense ...

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sequential scan instead of index scan

2012-08-07 Thread Ioannis Anagnostopoulos

Offhand I'd have thought that ANALYZE would gather stats on the
date_trunc expression (because it is indexed) and then you should get
something reasonably accurate for a comparison to a constant.
"Reasonably accurate" meaning "not off by two orders of magnitude".
Practically all of your runtime is going into this one indexscan,
and TBH it seems likely you'd be better off with a seqscan there.

regards, tom lane
You were right, after running ANALYZE on the temp table I eventually got 
the HASH JOIN we were talking about. Here is the plan:


"Hash Join  (cost=379575.54..1507341.18 rows=95142 width=128) (actual 
time=3128.940..634179.270 rows=10495795 loops=1)"
"  Hash Cond: (feed_all_y2012m08.ship_pos_messages.msg_id = 
tmp_tbl_messages.msg_id)"
"  ->  Append  (cost=0.00..1073525.24 rows=95142 width=128) (actual 
time=37.157..599002.314 rows=18891614 loops=1)"
"->  Seq Scan on ship_pos_messages  (cost=0.00..0.00 rows=1 
width=100) (actual time=0.001..0.001 rows=0 loops=1)"
"  Filter: ((date_part('day'::text, msg_date_rec) = 
2::double precision) AND (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone))"
"->  Seq Scan on ship_a_pos_messages ship_pos_messages 
(cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0 
loops=1)"
"  Filter: ((date_part('day'::text, msg_date_rec) = 
2::double precision) AND (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone))"
"->  Index Scan using idx_ship_b_std_pos_messages_date_trunc on 
ship_b_std_pos_messages ship_pos_messages  (cost=0.00..48111.95 
rows=4323 width=128) (actual time=37.156..23782.030 rows=808692 loops=1)"
"  Index Cond: (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone)"
"  Filter: (date_part('day'::text, msg_date_rec) = 2::double 
precision)"
"->  Index Scan using idx_ship_b_ext_pos_messages_date_trunc on 
ship_b_ext_pos_messages ship_pos_messages  (cost=0.00..1844.30 rows=154 
width=128) (actual time=42.042..1270.104 rows=28656 loops=1)"
"  Index Cond: (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone)"
"  Filter: (date_part('day'::text, msg_date_rec) = 2::double 
precision)"
"->  Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on 
ship_a_pos_messages_wk0 ship_pos_messages  (cost=0.00..1023568.99 
rows=90663 width=128) (actual time=51.181..571590.415 rows=18054266 
loops=1)"
"  Index Cond: (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone)"
"  Filter: (date_part('day'::text, msg_date_rec) = 2::double 
precision)"
"  ->  Hash  (cost=177590.46..177590.46 rows=12311446 width=8) (actual 
time=3082.762..3082.762 rows=12311446 loops=1)"

"Buckets: 524288  Batches: 4  Memory Usage: 120316kB"
"->  Seq Scan on tmp_tbl_messages (cost=0.00..177590.46 
rows=12311446 width=8) (actual time=0.022..1181.376 rows=12311446 loops=1)"

"Total runtime: 634764.596 ms"

The time looks reasonable but still quite high for the over night job I 
am need it for (have to run around 30 of those). So since the join has
been shorted I think I need to do something with the rows difference 
between actual and expected in the:


"->  Index Scan using idx_ship_a_pos_messages_wk0_date_trunc on 
ship_a_pos_messages_wk0 ship_pos_messages  (cost=0.00..1023568.99 
rows=90663 width=128) (actual time=51.181..571590.415 rows=18054266 
loops=1)"
"  Index Cond: (date_trunc('day'::text, msg_date_rec) = 
'2012-08-02 00:00:00'::timestamp without time zone)"
"  Filter: (date_part('day'::text, msg_date_rec) = 2::double 
precision)"


From what I understand a possible solution is to increase the stats 
target for the particular column(?). Any suggestion there? I assume we 
are talking about the msg_date_rec where the index is build uppon.
Finally, I do understand what you say about the Seq scan. However in 
this case I have consistently about 10min per execution while the 
SeqScan was giving me almost nothing at best and usually it was running 
for so long that

eventually was causing my server problems...

Kind Regards
Yiannis



Re: [PERFORM] Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m

2012-08-07 Thread Stefan Keller
Your proposal lacks the requirement that it's the same building from
where pharmacies and schools are reachable.
But I think about.

Yours, S.

2012/8/7 Tomas Vondra :
> On 7 Srpen 2012, 14:01, Stefan Keller wrote:
>> Hi
>>
>> I have an interesting query to be optimized related to this one [1].
>>
>> The query definition is: Select all buildings that have more than 1
>> pharmacies and more than 1 schools within a radius of 1000m.
>>
>> The problem is that I think that this query is inherently O(n^2). In
>> fact the solution I propose below takes forever...
>
> What about plain INTERSECT? Something like
>
> SELECT osm_id FROM osm_poi AS p, osm_polygon b
>WHERE p.tags @> hstore('amenity','pharmacy')
>AND ST_DWithin(b.way,p.way,1000)
> INTERSECT
> SELECT osm_id FROM osm_poi AS p, osm_polygon b
>WHERE p.tags @> hstore('amenity','school')
>AND ST_DWithin(b.way,p.way,1000)
>
> Or something like that. But maybe it's a complete nonsense ...
>
> Tomas
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m

2012-08-07 Thread Tomas Vondra
On 7 Srpen 2012, 14:22, Stefan Keller wrote:
> Your proposal lacks the requirement that it's the same building from
> where pharmacies and schools are reachable.
> But I think about.

I don't know the dataset so I've expected the osm_id to identify the
building - then the intersect should work as AND for the conditions.

And I see I've forgot to include the 'is building' condition, so it should
be like this:

 SELECT b.osm_id FROM osm_poi AS p, osm_polygon b
WHERE p.tags @> hstore('amenity','pharmacy')
AND b.tags @> hstore('building','yes')
AND ST_DWithin(b.way,p.way,1000)
 INTERSECT
 SELECT b.osm_id FROM osm_poi AS p, osm_polygon b
WHERE p.tags @> hstore('amenity','school')
AND b.tags @> hstore('building','yes')
AND ST_DWithin(b.way,p.way,1000)

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m

2012-08-07 Thread Craig James
On Tue, Aug 7, 2012 at 5:01 AM, Stefan Keller  wrote:

> Hi
>
> I have an interesting query to be optimized related to this one [1].
>
> The query definition is: Select all buildings that have more than 1
> pharmacies and more than 1 schools within a radius of 1000m.
>
> The problem is that I think that this query is inherently O(n^2). In
> fact the solution I propose below takes forever...
>

Maybe you could get rid of the O(n^2) aspect like this:

   Select all buildings that have more than 1
   pharmacies and more than 1 schools within a radius of 1000m
   from
  (Select all buildings that have more than four (pharmacy or school)
within a radius of 1000m)

The inner select should be fast -- you could make it fast by creating a new
property like "building of interest" that was "pharmacy or school" and
build an index on the "building of interest" property.

The inner query would reduce your sample set to a much smaller set of
buildings, and presumably the outer query could handle that pretty quickly.

Craig James


>
> My questions:
>
> 1. Any comments about the nature of this problem?
>
> 2. ... on how to speed it up ?
>
> 3. In the original query [1] there's a count which contains a
> subquery. According to my tests PostgreSQL does not allow this despite
> the documentation which says "count(expression)".
>
> Remarks: I know that "count(*)" could be faster on PostgreSQL but
> "count(osm_id)" does not change the query plan and this does not seem
> to be the bottleneck here anyway.
>
> Yours, S.
>
> [1]
> http://gis.stackexchange.com/questions/11445/selecting-pois-around-specific-buildings-using-postgis
>
>
> Here's my query:
>
> -- Select all buildings that have >1 pharmacies and >1 schools within
> 1000m:
> SELECT osm_id AS building_id
> FROM
>   (SELECT osm_id, way
>FROM osm_polygon
>WHERE tags @> hstore('building','yes')
>   ) AS b
> WHERE
>  (SELECT count(*) > 1
>   FROM osm_poi AS p
>   WHERE p.tags @> hstore('amenity','pharmacy')
>   AND ST_DWithin(b.way,p.way,1000)
>  )
>  AND
>  (SELECT count(*) > 1
>   FROM osm_poi AS p
>   WHERE p.tags @> hstore('amenity','school')
>   AND ST_DWithin(b.way,p.way,1000)
>  )
> -- Total query runtime: 4308488 ms. 66345 rows retrieved.
>
> Here's the query plan (from EXPLAIN):
> "Index Scan using osm_polygon_tags_idx on osm_polygon
> (cost=0.00..406812.81 rows=188 width=901)"
> "  Index Cond: (tags @> '"building"=>"yes"'::hstore)"
> "  Filter: ((SubPlan 1) AND (SubPlan 2))"
> "  SubPlan 1"
> "->  Aggregate  (cost=269.19..269.20 rows=1 width=0)"
> "  ->  Bitmap Heap Scan on osm_poi p  (cost=7.76..269.19
> rows=1 width=0)"
> "Recheck Cond: (way && st_expand(osm_polygon.way,
> 1000::double precision))"
> "Filter: ((tags @> '"amenity"=>"pharmacy"'::hstore)
> AND (osm_polygon.way && st_expand(way, 1000::double precision)) AND
> _st_dwithin(osm_polygon.way, way, 1000::double precision))"
> "->  Bitmap Index Scan on osm_poi_way_idx
> (cost=0.00..7.76 rows=62 width=0)"
> "  Index Cond: (way && st_expand(osm_polygon.way,
> 1000::double precision))"
> "  SubPlan 2"
> "->  Aggregate  (cost=269.19..269.20 rows=1 width=0)"
> "  ->  Bitmap Heap Scan on osm_poi p  (cost=7.76..269.19
> rows=1 width=0)"
> "Recheck Cond: (way && st_expand(osm_polygon.way,
> 1000::double precision))"
> "Filter: ((tags @> '"amenity"=>"school"'::hstore) AND
> (osm_polygon.way && st_expand(way, 1000::double precision)) AND
> _st_dwithin(osm_polygon.way, way, 1000::double precision))"
> "->  Bitmap Index Scan on osm_poi_way_idx
> (cost=0.00..7.76 rows=62 width=0)"
> "  Index Cond: (way && st_expand(osm_polygon.way,
> 1000::double precision))"
>
> ***
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Sequential scan instead of index scan

2012-08-07 Thread Jeff Janes
On Mon, Aug 6, 2012 at 8:08 AM, Ioannis Anagnostopoulos
 wrote:
> Hi, my query is very simple:
>
> select
> msg_id,
> msg_type,
> ship_pos_messages.pos_georef1,
> ship_pos_messages.pos_georef2,
> ship_pos_messages.pos_georef3,
> ship_pos_messages.pos_georef4,
> obj_id,
> ship_speed,
> ship_heading,
> ship_course,
> pos_point
> from
> feed_all_y2012m08.ship_pos_messages
> where
> extract('day' from msg_date_rec) = 1
> AND msg_id = any(ARRAY[7294724,14174174,22254408]);
>
> The msg_id is the pkey on the ship_pos_messages table and in this example it
> is working fast as it uses the pkey (primary key index) to make the
> selection. The expplain anayze follows:
...
>
> I think this is a pretty good plan and quite quick given the size of the
> table (88Million rows at present). However in real life the parameter where
> I search for msg_id is not an array of 3 ids but of 300.000 or more. It is
> then that the query forgets the plan and goes to sequential scan. Is there
> any way around? Or is this the best I can have?

What happens if you set "enable_seqscan=off" and run the query with
the very large list?  (This is an experiment, not a recommendation for
production use)


Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sequential scan instead of index scan

2012-08-07 Thread Ioannis Anagnostopoulos

On 07/08/2012 17:00, Jeff Janes wrote:

On Mon, Aug 6, 2012 at 8:08 AM, Ioannis Anagnostopoulos
 wrote:

Hi, my query is very simple:

select
 msg_id,
 msg_type,
 ship_pos_messages.pos_georef1,
 ship_pos_messages.pos_georef2,
 ship_pos_messages.pos_georef3,
 ship_pos_messages.pos_georef4,
 obj_id,
 ship_speed,
 ship_heading,
 ship_course,
 pos_point
 from
 feed_all_y2012m08.ship_pos_messages
 where
 extract('day' from msg_date_rec) = 1
 AND msg_id = any(ARRAY[7294724,14174174,22254408]);

The msg_id is the pkey on the ship_pos_messages table and in this example it
is working fast as it uses the pkey (primary key index) to make the
selection. The expplain anayze follows:

...

I think this is a pretty good plan and quite quick given the size of the
table (88Million rows at present). However in real life the parameter where
I search for msg_id is not an array of 3 ids but of 300.000 or more. It is
then that the query forgets the plan and goes to sequential scan. Is there
any way around? Or is this the best I can have?

What happens if you set "enable_seqscan=off" and run the query with
the very large list?  (This is an experiment, not a recommendation for
production use)


Cheers,

Jeff
As Tom said, the actual question is not valid. Seq scan are not bad, we 
just need to understand the way around it instead of forcing them off. 
In my case, the problem was the ARRAY as a parameter (which all together 
is not that great for holding so many data). By converting it into a 
temporary table and performing an inner join in the query (after 
analysing the temp table) you get a nice Hash join (or Merge Join if you 
don't analyse the temp table).


cheers Yiannis

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sequential scan instead of index scan

2012-08-07 Thread Jeff Janes
On Tue, Aug 7, 2012 at 9:06 AM, Ioannis Anagnostopoulos
 wrote:
> On 07/08/2012 17:00, Jeff Janes wrote:
>>
>> What happens if you set "enable_seqscan=off" and run the query with
>> the very large list?  (This is an experiment, not a recommendation for
>> production use)
>>
>>
>> Cheers,
>>
>> Jeff
>
> As Tom said, the actual question is not valid. Seq scan are not bad,

Right, that is why I proposed it as an experiment, not for production use.

> we just
> need to understand the way around it instead of forcing them off.

I think the first step to understanding the way around it is to force
it off, and see what the planner thinks it's next best option is, and
why it thinks that.


> In my
> case, the problem was the ARRAY as a parameter (which all together is not
> that great for holding so many data).

I think the only thing that is great for holding that much data is a
query against live permanent tables which returns it.  Given the
choice between stuffing it in an ARRAY and stuffing it in a temp table
and then manually analyzing it, neither one of those seems
fundamentally better than the other at the scale of 300,000.


> By converting it into a temporary
> table and performing an inner join in the query (after analysing the temp
> table) you get a nice Hash join (or Merge Join if you don't analyse the temp
> table).

I don't see those as being very good.  The "primary key" part of the
query is far more selective than the date part, so what you are doing
is fetching a huge number of rows only to throw out the vast majority
of them.

I think the optimal plan would be a bitmap scan on the indexes of the
"primary key" column.  This should automatically take advantage of the
sequential read nature of the table data to the extent the results are
well clustered, and if they aren't clustered it should benefit from
effective_io_concurrency if that is set appropriately.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Craig James
I found this discussion from 2005 that says you can drop and restore a
trigger inside a transaction, but that doing so locks the whole table:

http://archives.postgresql.org/pgsql-general/2005-01/msg01347.php
> From: Jeff Davis
>
> It got me curious enough that I tested it, and apparently droping a
> trigger locks the table. Any actions on that table must wait until the
> transaction that drops the trigger finishes.
>
> So, technically my system works, but requires a rather nasty lock while
> the transaction (the one that doesn't want the trigger to execute)
> finishes.

I have a process that copies customer data from one database to
another, and we know that the trigger has already done its work.  The
trigger is thus redundant, but it slows the copy WAY down, so I wanted
to drop/restore it inside a transaction.

Is it still true that drop-trigger inside a transaction will lock the
whole table?  We're using 8.4.

Thanks,
Craig

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Merlin Moncure
On Tue, Aug 7, 2012 at 1:48 PM, Craig James  wrote:
> I found this discussion from 2005 that says you can drop and restore a
> trigger inside a transaction, but that doing so locks the whole table:
>
> http://archives.postgresql.org/pgsql-general/2005-01/msg01347.php
>> From: Jeff Davis
>>
>> It got me curious enough that I tested it, and apparently droping a
>> trigger locks the table. Any actions on that table must wait until the
>> transaction that drops the trigger finishes.
>>
>> So, technically my system works, but requires a rather nasty lock while
>> the transaction (the one that doesn't want the trigger to execute)
>> finishes.
>
> I have a process that copies customer data from one database to
> another, and we know that the trigger has already done its work.  The
> trigger is thus redundant, but it slows the copy WAY down, so I wanted
> to drop/restore it inside a transaction.
>
> Is it still true that drop-trigger inside a transaction will lock the
> whole table?  We're using 8.4.

absolutely -- the database needs to guard against other writers to the
table doing inserts in the meantime.  there's no concept in SQL of
'enforce this trigger for all writers, except for me' nor should there
be.

one possible workaround is to hack your trigger function so that it
doesn't operate for particular roles.  so your trigger might be:

IF current_user = 'bulk_writer' THEN
  return new;
END IF;


then you can log in with the bulk_writer role when you want to bypass
the checks.  if your triggers are RI triggers though, you're hosed.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Scott Marlowe
On Tue, Aug 7, 2012 at 2:15 PM, Merlin Moncure  wrote:
> On Tue, Aug 7, 2012 at 1:48 PM, Craig James  wrote:
>> I found this discussion from 2005 that says you can drop and restore a
>> trigger inside a transaction, but that doing so locks the whole table:
>>
>> http://archives.postgresql.org/pgsql-general/2005-01/msg01347.php
>>> From: Jeff Davis
>>>
>>> It got me curious enough that I tested it, and apparently droping a
>>> trigger locks the table. Any actions on that table must wait until the
>>> transaction that drops the trigger finishes.
>>>
>>> So, technically my system works, but requires a rather nasty lock while
>>> the transaction (the one that doesn't want the trigger to execute)
>>> finishes.
>>
>> I have a process that copies customer data from one database to
>> another, and we know that the trigger has already done its work.  The
>> trigger is thus redundant, but it slows the copy WAY down, so I wanted
>> to drop/restore it inside a transaction.
>>
>> Is it still true that drop-trigger inside a transaction will lock the
>> whole table?  We're using 8.4.
>
> absolutely -- the database needs to guard against other writers to the
> table doing inserts in the meantime.  there's no concept in SQL of
> 'enforce this trigger for all writers, except for me' nor should there
> be.
>
> one possible workaround is to hack your trigger function so that it
> doesn't operate for particular roles.  so your trigger might be:
>
> IF current_user = 'bulk_writer' THEN
>   return new;
> END IF;
> 
>
> then you can log in with the bulk_writer role when you want to bypass
> the checks.  if your triggers are RI triggers though, you're hosed.

I'm willing to bet that even without doing anything, just invoking the
trigger will still cost a LOT more than the cost incurred with it just
turned off.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Jeff Janes
On Tue, Aug 7, 2012 at 1:15 PM, Merlin Moncure  wrote:
> On Tue, Aug 7, 2012 at 1:48 PM, Craig James  wrote:
>> I found this discussion from 2005 that says you can drop and restore a
>> trigger inside a transaction, but that doing so locks the whole table:
>>
>> http://archives.postgresql.org/pgsql-general/2005-01/msg01347.php
>>> From: Jeff Davis
>>>
>>> It got me curious enough that I tested it, and apparently droping a
>>> trigger locks the table. Any actions on that table must wait until the
>>> transaction that drops the trigger finishes.
>>>
>>> So, technically my system works, but requires a rather nasty lock while
>>> the transaction (the one that doesn't want the trigger to execute)
>>> finishes.
>>
>> I have a process that copies customer data from one database to
>> another, and we know that the trigger has already done its work.  The
>> trigger is thus redundant, but it slows the copy WAY down, so I wanted
>> to drop/restore it inside a transaction.
>>
>> Is it still true that drop-trigger inside a transaction will lock the
>> whole table?  We're using 8.4.
>
> absolutely -- the database needs to guard against other writers to the
> table doing inserts in the meantime.

But why must it?  Why can't other writers simply obey the trigger,
since its removal has not yet been committed?  You could have the
anomaly that a longer-running later-committing transaction used the
old trigger while a shorter-running earlier-committing transaction
used the new one (which isn't really an anomaly if the old and new are
identical), but is that even barred if neither of them is in
serializable mode?  And since triggers can do pretty much anything
they want internally, there isn't much of a transactional guarantee
with them anyway.

> there's no concept in SQL of
> 'enforce this trigger for all writers, except for me' nor should there
> be.

Why shouldn't there be, other than the bother of implementing and
documenting it?  Sometimes theory needs to compromise with reality.
When we don't provide slightly dangerous ways to make those
compromises, people are forced to use very dangerous ways instead.

>
> one possible workaround is to hack your trigger function so that it
> doesn't operate for particular roles.  so your trigger might be:
>
> IF current_user = 'bulk_writer' THEN
>   return new;
> END IF;
> 

I don't know Craig's case, but often the most expensive of the
"expensive stuff" is the bare fact of firing a trigger in the first
place.

cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Merlin Moncure
On Tue, Aug 7, 2012 at 3:45 PM, Jeff Janes  wrote:
>> absolutely -- the database needs to guard against other writers to the
>> table doing inserts in the meantime.
>
> But why must it?  Why can't other writers simply obey the trigger,
> since its removal has not yet been committed?  You could have the
> anomaly that a longer-running later-committing transaction used the
> old trigger while a shorter-running earlier-committing transaction
> used the new one (which isn't really an anomaly if the old and new are
> identical), but is that even barred if neither of them is in
> serializable mode?  And since triggers can do pretty much anything
> they want internally, there isn't much of a transactional guarantee
> with them anyway.

Triggers give a 100% transactional guarantee, period.  Yes, you can do
things in them that violate MVCC, like make dblink calls, but you can
do that from any SQL statement; they are no less transactionally
guaranteed than regular SQL. As to your wider point, you could in
theory interleave other work with adjustment of triggers although it
seems pretty complicated and weird.  Also RI triggers (the most
important case) would need special handling since (like check
constraints) they are supposed to apply to the table as a whole, not
records inserted since trigger creation.   Also serializable would be
right out as you noted.

>> there's no concept in SQL of
>> 'enforce this trigger for all writers, except for me' nor should there
>> be.
>
> Why shouldn't there be, other than the bother of implementing and
> documenting it?  Sometimes theory needs to compromise with reality.
> When we don't provide slightly dangerous ways to make those
> compromises, people are forced to use very dangerous ways instead.
>
>>
>> one possible workaround is to hack your trigger function so that it
>> doesn't operate for particular roles.  so your trigger might be:
>>
>> IF current_user = 'bulk_writer' THEN
>>   return new;
>> END IF;
>> 
>
> I don't know Craig's case, but often the most expensive of the
> "expensive stuff" is the bare fact of firing a trigger in the first
> place.

That's highly debatable.  a function call is somewhat expensive but is
a fixed cpu cost.  RI triggers or complicated queries can really get
expensive, especially with large tables.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Craig James
On Tue, Aug 7, 2012 at 1:45 PM, Jeff Janes  wrote:
> On Tue, Aug 7, 2012 at 1:15 PM, Merlin Moncure  wrote:
>> On Tue, Aug 7, 2012 at 1:48 PM, Craig James  wrote:
>>> I found this discussion from 2005 that says you can drop and restore a
>>> trigger inside a transaction, but that doing so locks the whole table:
>>>
>>> http://archives.postgresql.org/pgsql-general/2005-01/msg01347.php
 From: Jeff Davis

 It got me curious enough that I tested it, and apparently droping a
 trigger locks the table. Any actions on that table must wait until the
 transaction that drops the trigger finishes.

 So, technically my system works, but requires a rather nasty lock while
 the transaction (the one that doesn't want the trigger to execute)
 finishes.
>>>
>>> I have a process that copies customer data from one database to
>>> another, and we know that the trigger has already done its work.  The
>>> trigger is thus redundant, but it slows the copy WAY down, so I wanted
>>> to drop/restore it inside a transaction.
>>>
>>> Is it still true that drop-trigger inside a transaction will lock the
>>> whole table?  We're using 8.4.
>>
>> absolutely -- the database needs to guard against other writers to the
>> table doing inserts in the meantime.
>
> But why must it?  Why can't other writers simply obey the trigger,
> since its removal has not yet been committed?
>> there's no concept in SQL of
>> 'enforce this trigger for all writers, except for me' nor should there
>> be.
>
> Why shouldn't there be, other than the bother of implementing and
> documenting it?  Sometimes theory needs to compromise with reality.
> When we don't provide slightly dangerous ways to make those
> compromises, people are forced to use very dangerous ways instead.
>
>>
>> one possible workaround is to hack your trigger function so that it
>> doesn't operate for particular roles.  so your trigger might be:
>>
>> IF current_user = 'bulk_writer' THEN
>>   return new;
>> END IF;
>> 
>
> I don't know Craig's case, but often the most expensive of the
> "expensive stuff" is the bare fact of firing a trigger in the first
> place.

My use case is pretty simple: Copy some already-validated user data
from one schema to another.  Since the trigger has already been
applied, we're guaranteed that the data is already in the form we
want.

For your amusement: The trigger ensures that you can't buy illegal
drugs, explosives, weapons of war, corrosives and other dangerous or
illegal chemical compounds.  It executes a query against known
compounds from the DEA, Homeland Security, Department of
Transportation and several other lists.  Then calls a series of
functions that implement "rules" to find illegal or dangerous
compounds that aren't on anyone's list.  Some examples: "cocaine
derivatives" for obvious reasons; "two or more nitro groups on a small
molecule" to find chemicals that might explode; and "Metal-hydrogen
bond" to find things that will catch fire if exposed to air.

This is implemented in the database to esure that no matter how badly
a programmer screws up an app, you still can't get these chemical
compounds into an order.  The chemicals need to be in our database for
informational purposes, but we don't want law enforcement knocking on
our door.

Obviously this is a very expensive trigger, but one that we can drop
in a very specific circumstance.  But we NEVER want to drop it for
everyone.  It seems like a very reasonable use-case to me.

Craig James

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Samuel Gendler
On Tue, Aug 7, 2012 at 2:39 PM, Craig James  wrote:

>
> Obviously this is a very expensive trigger, but one that we can drop
> in a very specific circumstance.  But we NEVER want to drop it for
> everyone.  It seems like a very reasonable use-case to me.
>
>
Sounds like you should try doing the work inside the trigger conditionally
and see if that improves performance enough, since you aren't likely to get
anything that better suits your needs without patching postgres.


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Merlin Moncure
On Tue, Aug 7, 2012 at 4:39 PM, Craig James  wrote:
> On Tue, Aug 7, 2012 at 1:45 PM, Jeff Janes  wrote:
>> On Tue, Aug 7, 2012 at 1:15 PM, Merlin Moncure  wrote:
>>> On Tue, Aug 7, 2012 at 1:48 PM, Craig James  wrote:
 I found this discussion from 2005 that says you can drop and restore a
 trigger inside a transaction, but that doing so locks the whole table:

 http://archives.postgresql.org/pgsql-general/2005-01/msg01347.php
> From: Jeff Davis
>
> It got me curious enough that I tested it, and apparently droping a
> trigger locks the table. Any actions on that table must wait until the
> transaction that drops the trigger finishes.
>
> So, technically my system works, but requires a rather nasty lock while
> the transaction (the one that doesn't want the trigger to execute)
> finishes.

 I have a process that copies customer data from one database to
 another, and we know that the trigger has already done its work.  The
 trigger is thus redundant, but it slows the copy WAY down, so I wanted
 to drop/restore it inside a transaction.

 Is it still true that drop-trigger inside a transaction will lock the
 whole table?  We're using 8.4.
>>>
>>> absolutely -- the database needs to guard against other writers to the
>>> table doing inserts in the meantime.
>>
>> But why must it?  Why can't other writers simply obey the trigger,
>> since its removal has not yet been committed?
>>> there's no concept in SQL of
>>> 'enforce this trigger for all writers, except for me' nor should there
>>> be.
>>
>> Why shouldn't there be, other than the bother of implementing and
>> documenting it?  Sometimes theory needs to compromise with reality.
>> When we don't provide slightly dangerous ways to make those
>> compromises, people are forced to use very dangerous ways instead.
>>
>>>
>>> one possible workaround is to hack your trigger function so that it
>>> doesn't operate for particular roles.  so your trigger might be:
>>>
>>> IF current_user = 'bulk_writer' THEN
>>>   return new;
>>> END IF;
>>> 
>>
>> I don't know Craig's case, but often the most expensive of the
>> "expensive stuff" is the bare fact of firing a trigger in the first
>> place.
>
> My use case is pretty simple: Copy some already-validated user data
> from one schema to another.  Since the trigger has already been
> applied, we're guaranteed that the data is already in the form we
> want.
>
> For your amusement: The trigger ensures that you can't buy illegal
> drugs, explosives, weapons of war, corrosives and other dangerous or
> illegal chemical compounds.  It executes a query against known
> compounds from the DEA, Homeland Security, Department of
> Transportation and several other lists.  Then calls a series of
> functions that implement "rules" to find illegal or dangerous
> compounds that aren't on anyone's list.  Some examples: "cocaine
> derivatives" for obvious reasons; "two or more nitro groups on a small
> molecule" to find chemicals that might explode; and "Metal-hydrogen
> bond" to find things that will catch fire if exposed to air.
>
> This is implemented in the database to esure that no matter how badly
> a programmer screws up an app, you still can't get these chemical
> compounds into an order.  The chemicals need to be in our database for
> informational purposes, but we don't want law enforcement knocking on
> our door.
>
> Obviously this is a very expensive trigger, but one that we can drop
> in a very specific circumstance.  But we NEVER want to drop it for
> everyone.  It seems like a very reasonable use-case to me.

well, there you go:  create a role that is excepted from having to run
through those checks and take appropriate precautions (password,
pg_hba.conf etc) so that only people/things that are supposed to
bypass the checks can do so.  then the trigger can look for the role
and punt.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Jeff Janes
On Tue, Aug 7, 2012 at 2:39 PM, Craig James  wrote:
> On Tue, Aug 7, 2012 at 1:45 PM, Jeff Janes  wrote:
>> On Tue, Aug 7, 2012 at 1:15 PM, Merlin Moncure  wrote:
>>>
>>> IF current_user = 'bulk_writer' THEN
>>>   return new;
>>> END IF;
>>> 
>>
>> I don't know Craig's case, but often the most expensive of the
>> "expensive stuff" is the bare fact of firing a trigger in the first
>> place.
>
> My use case is pretty simple: Copy some already-validated user data
> from one schema to another.  Since the trigger has already been
> applied, we're guaranteed that the data is already in the form we
> want.
>
> For your amusement:

Thanks.  That was probably more amusing to me in particular than to most
pgsql hackers, as I think I've been a victim of your trigger.


...
>
> Obviously this is a very expensive trigger, but one that we can drop
> in a very specific circumstance.  But we NEVER want to drop it for
> everyone.  It seems like a very reasonable use-case to me.

And since the query is absolutely expensive, not just expensive
relative to a no-op, then Merlin's suggestion seems entirely suitable
for your use-case.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Craig James
On Tue, Aug 7, 2012 at 3:22 PM, Jeff Janes  wrote:
> On Tue, Aug 7, 2012 at 2:39 PM, Craig James  wrote:
>> On Tue, Aug 7, 2012 at 1:45 PM, Jeff Janes  wrote:
>>> On Tue, Aug 7, 2012 at 1:15 PM, Merlin Moncure  wrote:

 IF current_user = 'bulk_writer' THEN
   return new;
 END IF;
 
>>>
>>> I don't know Craig's case, but often the most expensive of the
>>> "expensive stuff" is the bare fact of firing a trigger in the first
>>> place.
>>
>> My use case is pretty simple: Copy some already-validated user data
>> from one schema to another.  Since the trigger has already been
>> applied, we're guaranteed that the data is already in the form we
>> want.
>>
>> For your amusement:
>
> Thanks.  That was probably more amusing to me in particular than to most
> pgsql hackers, as I think I've been a victim of your trigger.
>
>
> ...
>>
>> Obviously this is a very expensive trigger, but one that we can drop
>> in a very specific circumstance.  But we NEVER want to drop it for
>> everyone.  It seems like a very reasonable use-case to me.
>
> And since the query is absolutely expensive, not just expensive
> relative to a no-op, then Merlin's suggestion seems entirely suitable
> for your use-case.

Thanks for the ideas.  I think I have something to work with.

Craig James

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m

2012-08-07 Thread Stefan Keller
Hi Craig

Clever proposal!
I slightly tried to adapt it to the hstore involved.
Now I'm having a weird problem that PG says that "relation 'p' does not exist".
Why does PG recognize table b in the subquery but not table p?
Any ideas?

-- Stefan


SELECT b.way AS building_geometry
FROM
 (SELECT way
  FROM osm_polygon
  WHERE tags @> hstore('building','yes')
 ) AS b,
 (SELECT way, tags->'amenity' as value
  FROM osm_poi
  WHERE tags ? 'amenity'
 ) AS p
WHERE
 (SELECT count(*) > 1
  FROM p
  WHERE p.value = 'pharmacy'
  AND ST_DWithin(b.way,p.way,1000)
 )
 AND
 (SELECT count(*) > 1
  FROM p
  WHERE p.value = 'school'
  AND ST_DWithin(b.way,p.way,1000)
 )

ERROR:  relation "p" does not exist
LINE 14:   FROM p


2012/8/7 Craig James :
> On Tue, Aug 7, 2012 at 5:01 AM, Stefan Keller  wrote:
>>
>> Hi
>>
>> I have an interesting query to be optimized related to this one [1].
>>
>> The query definition is: Select all buildings that have more than 1
>> pharmacies and more than 1 schools within a radius of 1000m.
>>
>> The problem is that I think that this query is inherently O(n^2). In
>> fact the solution I propose below takes forever...
>
>
> Maybe you could get rid of the O(n^2) aspect like this:
>
>
>Select all buildings that have more than 1
>pharmacies and more than 1 schools within a radius of 1000m
>from
>   (Select all buildings that have more than four (pharmacy or school)
> within a radius of 1000m)
>
> The inner select should be fast -- you could make it fast by creating a new
> property like "building of interest" that was "pharmacy or school" and build
> an index on the "building of interest" property.
>
> The inner query would reduce your sample set to a much smaller set of
> buildings, and presumably the outer query could handle that pretty quickly.
>
> Craig James
>
>>
>>
>> My questions:
>>
>> 1. Any comments about the nature of this problem?
>>
>> 2. ... on how to speed it up ?
>>
>> 3. In the original query [1] there's a count which contains a
>> subquery. According to my tests PostgreSQL does not allow this despite
>> the documentation which says "count(expression)".
>>
>> Remarks: I know that "count(*)" could be faster on PostgreSQL but
>> "count(osm_id)" does not change the query plan and this does not seem
>> to be the bottleneck here anyway.
>>
>> Yours, S.
>>
>> [1]
>> http://gis.stackexchange.com/questions/11445/selecting-pois-around-specific-buildings-using-postgis
>>
>>
>> Here's my query:
>>
>> -- Select all buildings that have >1 pharmacies and >1 schools within
>> 1000m:
>> SELECT osm_id AS building_id
>> FROM
>>   (SELECT osm_id, way
>>FROM osm_polygon
>>WHERE tags @> hstore('building','yes')
>>   ) AS b
>> WHERE
>>  (SELECT count(*) > 1
>>   FROM osm_poi AS p
>>   WHERE p.tags @> hstore('amenity','pharmacy')
>>   AND ST_DWithin(b.way,p.way,1000)
>>  )
>>  AND
>>  (SELECT count(*) > 1
>>   FROM osm_poi AS p
>>   WHERE p.tags @> hstore('amenity','school')
>>   AND ST_DWithin(b.way,p.way,1000)
>>  )
>> -- Total query runtime: 4308488 ms. 66345 rows retrieved.
>>
>> Here's the query plan (from EXPLAIN):
>> "Index Scan using osm_polygon_tags_idx on osm_polygon
>> (cost=0.00..406812.81 rows=188 width=901)"
>> "  Index Cond: (tags @> '"building"=>"yes"'::hstore)"
>> "  Filter: ((SubPlan 1) AND (SubPlan 2))"
>> "  SubPlan 1"
>> "->  Aggregate  (cost=269.19..269.20 rows=1 width=0)"
>> "  ->  Bitmap Heap Scan on osm_poi p  (cost=7.76..269.19
>> rows=1 width=0)"
>> "Recheck Cond: (way && st_expand(osm_polygon.way,
>> 1000::double precision))"
>> "Filter: ((tags @> '"amenity"=>"pharmacy"'::hstore)
>> AND (osm_polygon.way && st_expand(way, 1000::double precision)) AND
>> _st_dwithin(osm_polygon.way, way, 1000::double precision))"
>> "->  Bitmap Index Scan on osm_poi_way_idx
>> (cost=0.00..7.76 rows=62 width=0)"
>> "  Index Cond: (way && st_expand(osm_polygon.way,
>> 1000::double precision))"
>> "  SubPlan 2"
>> "->  Aggregate  (cost=269.19..269.20 rows=1 width=0)"
>> "  ->  Bitmap Heap Scan on osm_poi p  (cost=7.76..269.19
>> rows=1 width=0)"
>> "Recheck Cond: (way && st_expand(osm_polygon.way,
>> 1000::double precision))"
>> "Filter: ((tags @> '"amenity"=>"school"'::hstore) AND
>> (osm_polygon.way && st_expand(way, 1000::double precision)) AND
>> _st_dwithin(osm_polygon.way, way, 1000::double precision))"
>> "->  Bitmap Index Scan on osm_poi_way_idx
>> (cost=0.00..7.76 rows=62 width=0)"
>> "  Index Cond: (way && st_expand(osm_polygon.way,
>> 1000::double precision))"
>>
>> ***
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mai

Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Craig Ringer

On 08/08/2012 04:15 AM, Merlin Moncure wrote:

IF current_user = 'bulk_writer' THEN
   return new;
END IF;

... or re-create the trigger with a `WHEN` clause (only available in 
newer Pg versions, see CREATE TRIGGER) that excludes the migrated 
customer ID. You'd have to do it in a new tx to avoid locking the table 
for ages though.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Merlin Moncure
On Tue, Aug 7, 2012 at 5:29 PM, Craig Ringer  wrote:
> On 08/08/2012 04:15 AM, Merlin Moncure wrote:
>>
>> IF current_user = 'bulk_writer' THEN
>>return new;
>> END IF;
>> 
>
> ... or re-create the trigger with a `WHEN` clause (only available in newer
> Pg versions, see CREATE TRIGGER) that excludes the migrated customer ID.
> You'd have to do it in a new tx to avoid locking the table for ages though.

yeah --- and, locking aside, I'd advise you not to do that anyways:
try and keep one block of code that enforces all the rules properly.
also, good deployment practices (especially in cases of security
sensitive environments) should have good firewalls between production
services and developer introduced code.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m

2012-08-07 Thread Jeff Janes
On Tue, Aug 7, 2012 at 5:07 PM, Stefan Keller  wrote:
> Hi Craig
>
> Clever proposal!
> I slightly tried to adapt it to the hstore involved.
> Now I'm having a weird problem that PG says that "relation 'p' does not 
> exist".
> Why does PG recognize table b in the subquery but not table p?
> Any ideas?

I don't think it does recognize b, either.  It just fell over on p
before it had a chance to fall over on b.

I think you have to use WITH if you want to reference the same
subquery in multiple FROMs.

Another approach would be to add explicit conditions for there being
at least 1 school and 1 pharmacy within distance.  There can't be >1
unless there is >=1, but the join possibilities for >=1 (i.e. "where
exists" rather than "where (select count(*)...)>1" )  are much more
attractive than the ones for >1.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Postgresql - performance of using array in big database

2012-08-07 Thread roberthanco
Hello
Let say we have a table with 6 million records. There are 16 integer columns 
and few text column. It is read-only table so every integer column have an 
index. Every record is around 50-60 bytes.
The table name is "Item"
The server is: 12 GB RAM, 1,5 TB SATA, 4 CORES. All server for postgres.
There are many more tables in this database so RAM do not cover all database.
I want to add to table "Item" a column "a_elements" (array type of big 
integers) Every record would have not more than 50-60 elements in this column.
After that i would create index GIN on this column and typical select should 
look like this:
select*from item where..and5<@ a_elements;
I have also second, more classical, option.
Do not add column a_elements to table item but create table elements with two 
columns:
id_itemid_elementThis table would have around 200 mln records.
I am able to do partitioning on this tables so number of records would reduce 
to 20 mln in table elements and 500 K in table item.
The second option select looks like this:
select item.* 
from item 
    leftjoin elements on(item.id_item=elements.id_item) 
where 
and5= elements.id_element
I wonder what option would be better in performance point of view. Is postgres 
able to use many different indexes with index GIN (option 1) in a single query ?
I need to make a good decision because import of this data will take me a 20 
days.

[PERFORM] Postgres Upgrade from 8.4 to 9.1

2012-08-07 Thread Rajiv Kasera
Hi,

 

I am planning a Postgres migration from 8.4 to 9.1 to be able to leverage
the replication features available in the 9.1 version. I would like to
understand the following things in this regard:

 

1.   Any good documentation which should help in this upgrade.

2.   To be able to replicate the complete steps in a test environment
before doing it in LIVE which is running 9.0, is it possible to revert this
database to 8.4 and then upgrade to 9.1. 

3.   Any known issues and changes required to be done in the application
for this upgrade.

 

Thanks,
rajiv



Re: [PERFORM] Postgresql - performance of using array in big database

2012-08-07 Thread Craig Ringer

On 08/03/2012 05:14 PM, robertha...@o2.pl wrote:

> It is read-only table so every integer column have an index.

First tip: Define the table without the indexes. INSERT your data, and 
only after it is inserted create your indexes.


Similarly, if you're making huge changes to the table you should 
consider dropping the indexes, making the changes, and re-creating the 
indexes. You might not have to drop the indexes if you aren't changing 
indexed fields, since HOT might save you, but it depends a lot on the 
specifics of the table's on-disk layout etc.



The server is: 12 GB RAM, 1,5 TB SATA, 4 CORES. All server for postgres.
There are many more tables in this database so RAM do not cover all 
database.


OK, in that case more info on the disk subsystem is generally helpful. 
Disk spin speed, type? RAID configuration if any? eg:


  4 x 750GB 7200RPM Western Digital  Black SATA 3 HDDs in RAID 10 using 
the Linux 'md' raid driver


or

  2 x 1.5TB 7200RPM "Enterprise/near-line" SATA3 HDDs in RAID 1 using a 
Dell PARC  controller with BBU in write-back cache mode.


... though if you're only bulk-inserting the BBU doesn't matter much.

|
| I wonder what option would be better in performance point of view.


I would advise you to test on a subset of your data. Try loading the 
same 50,000 records into different databases, one with each structure. 
Measure how long the load takes for each design, and how long the 
queries you need to run take to execute. Repeat the process with 500,000 
records and see if one design slows down more than the other design 
does. Etc.


I need to make a good decision because import of this data will take 
me a 20 days.


For the sheer size of data you have you might want to think about using 
pg_bulkload. If you can't or don't want to do that, then at least use 
COPY to load big batches of your data.


--
Craig Ringer


Re: [PERFORM] Postgres Upgrade from 8.4 to 9.1

2012-08-07 Thread Craig Ringer

On 08/06/2012 01:38 PM, Rajiv Kasera wrote:


Hi,

I am planning a Postgres migration from 8.4 to 9.1 to be able to 
leverage the replication features available in the 9.1 version. I 
would like to understand the following things in this regard:


1.Any good documentation which should help in this upgrade.

The most important documentation here is the release notes for the major 
.0 versions:


http://www.postgresql.org/docs/current/static/release-9-1.html 

http://www.postgresql.org/docs/ 
current 
/static/release-9-0.html 



... in particular the "Migration to" sections.

2.To be able to replicate the complete steps in a test environment 
before doing it in LIVE which is running 9.0, is it possible to revert 
this database to 8.4 and then upgrade to 9.1.


No. Take a copy before upgrading. Always. Keep the copy read-only in 
some safe place.


If you want to revert, make a copy of your backup and use that.

Upgrades from 8.4 to 9.0 or 9.1 require a dump and reload or the use of 
the pg_upgrade tool. You can't just install the new version and start it 
on your old database.


3.Any known issues and changes required to be done in the application 
for this upgrade.




See the release notes.

--
Craig Ringer