[PERFORM] Why should such a simple query over indexed columns be so slow?

2012-01-30 Thread Alessandro Gagliardi
erts and selects. As for *WAL Configuration*: I'm afraid I don't even know what that is. The query is normally run from a Python web server though the above explain was run using pgAdmin3, though I doubt that's relevant. As for *GUC Settings*: Again, I don't know what this is. Whatever Heroku defaults to is what I'm using. Thank you in advance! -Alessandro Gagliardi

Re: [PERFORM] Why should such a simple query over indexed columns be so slow?

2012-01-30 Thread Alessandro Gagliardi
I think I'll start there. Thank you very much! -Alessandro On Mon, Jan 30, 2012 at 11:24 AM, Claudio Freire wrote: > On Mon, Jan 30, 2012 at 4:13 PM, Alessandro Gagliardi > wrote: > > So, here's the query: > > > > SELECT private, COUNT(block_id) FROM block

Re: [PERFORM] Why should such a simple query over indexed columns be so slow?

2012-01-30 Thread Alessandro Gagliardi
wrote: > On Mon, Jan 30, 2012 at 5:35 PM, Alessandro Gagliardi > wrote: > > To answer your (non-)question about Heroku, it's a cloud service, so I > don't > > host PostgreSQL myself. I'm not sure how much I can mess with things like > > GUC since I don'

Re: [PERFORM] Why should such a simple query over indexed columns be so slow?

2012-01-30 Thread Alessandro Gagliardi
On Mon, Jan 30, 2012 at 1:25 PM, Josh Berkus wrote: > You can do "SHOW random_page_cost" yourself right now, too. > > 4 I also tried "SHOW seq_page_cost" and that's 1. Looking at http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-RANDOM-PAGE-COSTI wonder if I should try

Re: [PERFORM] Why should such a simple query over indexed columns be so slow?

2012-01-30 Thread Alessandro Gagliardi
're saying that enable_seqscan is determining whether or not the data is being cached On Mon, Jan 30, 2012 at 1:13 PM, Fernando Hevia wrote: > > On Mon, Jan 30, 2012 at 17:35, Alessandro Gagliardi > wrote: > >> Well that was a *lot* faster: >> >> "HashAggrega

Re: [PERFORM] Why should such a simple query over indexed columns be so slow?

2012-01-30 Thread Alessandro Gagliardi
I set random_page_cost to 2 (with enable_seqscan on) and get the same performance I got with enable_seqscan off. So far so good. Now I just need to figure out how to set it globally. :-/ On Mon, Jan 30, 2012 at 1:45 PM, Scott Marlowe wrote: > On Mon, Jan 30, 2012 at 2:39 PM, Alessandro Gaglia

Re: [PERFORM] Why should such a simple query over indexed columns be so slow?

2012-01-30 Thread Alessandro Gagliardi
Got it (with a little bit of klutzing around). :) Thanks! On Mon, Jan 30, 2012 at 2:24 PM, Scott Marlowe wrote: > On Mon, Jan 30, 2012 at 3:19 PM, Scott Marlowe > wrote: > > On Mon, Jan 30, 2012 at 2:55 PM, Alessandro Gagliardi > > wrote: > >> I set random_page_cos

[PERFORM] From Simple to Complex

2012-01-31 Thread Alessandro Gagliardi
till don't know. Heroku hosts the database on Amazon's servers, so maybe that answers the question? GUC Settings: As per the yesterday's discussion, I reduced random_page_cost to 2. Other than that, it's all default. Bonus question: If that was too simple, here's somet

Re: [PERFORM] From Simple to Complex

2012-01-31 Thread Alessandro Gagliardi
ldren." I should have realized that as I exploited that "limitation" in three of my tables. Gradually adding those indices now; will report on what kind of difference it makes On Tue, Jan 31, 2012 at 1:22 PM, Alessandro Gagliardi wrote: > My slow query today is somewhat more c

Re: [PERFORM] From Simple to Complex

2012-01-31 Thread Alessandro Gagliardi
tual time=0.012..0.012 rows=0 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " ->

Re: [PERFORM] From Simple to Complex

2012-01-31 Thread Alessandro Gagliardi
oments.moment_id GROUP BY relname, emotion ORDER BY relname, emotion; That was a bit faster, but still very slow. Here's the EXPLAIN: http://explain.depesz.com/s/ZdF On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi wrote: > I changed the query a bit so the results would not chang

Re: [PERFORM] From Simple to Complex

2012-02-01 Thread Alessandro Gagliardi
s. I assume that that's because I'm hitting a memory limit and paging out. Is that right? On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi wrote: > I just got a pointer on presenting EXPLAIN ANALYZE in a more human > friendly fashion (thanks, Agent M!): http://explain.depesz.com/

Re: [PERFORM] From Simple to Complex

2012-02-01 Thread Alessandro Gagliardi
Scan. I've already lowered random_page_cost to 2. Maybe I should lower it to 1.5? Actually 60K should be plenty for my purposes anyway. On Wed, Feb 1, 2012 at 10:35 AM, Scott Marlowe wrote: > On Wed, Feb 1, 2012 at 11:19 AM, Alessandro Gagliardi > wrote: > > Interestingly, inc

Re: [PERFORM] From Simple to Complex

2012-02-01 Thread Alessandro Gagliardi
On Wed, Feb 1, 2012 at 11:04 AM, Bob Lunney wrote: > Possibly. What does > > psql > show work_mem; > > say? > > 100MB

Re: [PERFORM] From Simple to Complex

2012-02-02 Thread Alessandro Gagliardi
On Thu, Feb 2, 2012 at 6:52 AM, Merlin Moncure wrote: > also, is effective_cache_size set to a reasonable value? > > Yeah, it's 153kB

[PERFORM] timestamp with time zone

2012-02-09 Thread Alessandro Gagliardi
Here's my query: SELECT foursquare.name, foursquare.city, COUNT(moment_id) AS popularity FROM foursq_categories JOIN foursquare USING (foursq_id) JOIN places USING (foursq_id) JOIN blocks USING (block_id) WHERE "primary" AND (created at time zone timezone)::date = 'yesterday' AND (country = 'U

Re: [PERFORM] timestamp with time zone

2012-02-09 Thread Alessandro Gagliardi
ON blocks USING btree (networks ); CREATE INDEX blocks_private_idx ON blocks USING btree (private ); CREATE INDEX blocks_shared_idx ON blocks USING btree (shared ); CREATE INDEX blocks_timezone_idx ON blocks USING btree (timezone ); On Thu, Feb 9, 2012 at 11:46 AM, To

Re: [PERFORM] timestamp with time zone

2012-02-09 Thread Alessandro Gagliardi
ed_date_idx (or blocks_created_at_timezone_idx). How do I make that happen? On Thu, Feb 9, 2012 at 12:15 PM, Kevin Grittner wrote: > Alessandro Gagliardi wrote: > > > (Actually, I originally did try one on "(created at time zone > > timezone)::date" but couldn't figure out how t

Re: [PERFORM] timestamp with time zone

2012-02-10 Thread Alessandro Gagliardi
Thu, Feb 9, 2012 at 10:19 PM, Tom Lane wrote: > Alessandro Gagliardi writes: > > Still slow as mud: http://explain.depesz.com/s/Zfn > > Now I've got indices on created, timezone, created at time zone timezone, > > and (created at time zone timezone)::date. Clearly the p

[PERFORM] Why so slow?

2012-02-17 Thread Alessandro Gagliardi
Comparing SELECT DISTINCT(user_id) FROM blocks JOIN seen_its USING (user_id) WHERE seen_its.created BETWEEN (now()::date - interval '8 days')::timestamp AND now()::date::timestamp to SELECT DISTINCT(user_id) FROM seen_its WHERE created BETWEEN (now()::date - interval '8 days')::timestamp AND now():

Re: [PERFORM] Why so slow?

2012-02-17 Thread Alessandro Gagliardi
> On 02/17/2012 10:34 AM, Alessandro Gagliardi wrote: > >> Comparing >> SELECT DISTINCT(user_id) FROM blocks JOIN seen_its USING (user_id) WHERE >> seen_its.created BETWEEN (now()::date - interval '8 days')::timestamp AND >> now()::date::timestamp >> to &

Re: [PERFORM] Why so slow?

2012-02-20 Thread Alessandro Gagliardi
Ah, that did make a big difference! It went from taking 10x as long to taking only 1.5x as long (about what I would have expected, if not better.) Thank you! On Fri, Feb 17, 2012 at 9:29 PM, Ants Aasma wrote: > On Feb 17, 2012 8:35 PM, "Alessandro Gagliardi" > wrote: > &g

[PERFORM] Indexes and Primary Keys on Rapidly Growing Tables

2012-02-20 Thread Alessandro Gagliardi
ple think that this might be a viable solution or if I'm barking up the wrong tree. Thanks! -Alessandro On Fri, Feb 17, 2012 at 10:34 AM, Alessandro Gagliardi wrote: > CREATE TABLE seen_its ( > user_id character(24) NOT NULL, > moment_id character(24) NOT NULL, > created

Re: [PERFORM] Indexes and Primary Keys on Rapidly Growing Tables

2012-02-21 Thread Alessandro Gagliardi
gt; On 2/20/12 2:06 PM, Alessandro Gagliardi wrote: > > . But first I just want to know if people > > think that this might be a viable solution or if I'm barking up the wrong > > tree. > > Batching is usually helpful for inserts, especially if there's a unique >

Re: [PERFORM] Indexes and Primary Keys on Rapidly Growing Tables

2012-02-21 Thread Alessandro Gagliardi
True. I implemented the SAVEPOINTs solution across the board. We'll see what kind of difference it makes. If it's fast enough, I may be able to do without that. On Tue, Feb 21, 2012 at 3:53 PM, Samuel Gendler wrote: > > On Tue, Feb 21, 2012 at 9:59 AM, Alessandro Gagliardi >

[PERFORM] set autovacuum=off

2012-02-23 Thread Alessandro Gagliardi
I have a database where I virtually never delete and almost never do updates. (The updates might change in the future but for now it's okay to assume they never happen.) As such, it seems like it might be worth it to set autovacuum=off or at least make it so vacuuming hardly ever occurs. Actually,

Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Alessandro Gagliardi
'd probably get the same 55P02 error if I tried to change them.) On Thu, Feb 23, 2012 at 7:18 AM, Andy Colson wrote: > On 2/23/2012 6:34 AM, Thom Brown wrote: > >> On 22 February 2012 23:50, Alessandro Gagliardi >> wrote: >> >>> I have a database where I v

Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Alessandro Gagliardi
ing a database that is being used in this way. Any cache sizes I should be messing with? Etc. Thank you, -Alessandro On Thu, Feb 23, 2012 at 9:45 AM, Thom Brown wrote: > On 23 February 2012 17:35, Alessandro Gagliardi > wrote: > > I should have been more clear. I virtually never del

Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Alessandro Gagliardi
On Thu, Feb 23, 2012 at 10:01 AM, Steve Crawford < scrawf...@pinpointresearch.com> wrote: > ** > The documentation has information like "This parameter can only be set in > the postgresql.conf file or on the server command line." that will tell > you in advance which settings will fail when you at

Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Alessandro Gagliardi
be looking for here. On Thu, Feb 23, 2012 at 10:42 AM, Peter van Hardenberg wrote: > On Thu, Feb 23, 2012 at 10:38 AM, Alessandro Gagliardi > wrote: > > around the same time as disabling auto-vacuum, so that could account for > the > > coincidental speed up). I'm not

Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Alessandro Gagliardi
On Thu, Feb 23, 2012 at 11:07 AM, Andy Colson wrote: > That depends on if you have triggers that are doing selects. But in > general you are correct, analyze wont help inserts. > > I do have some, actually. I have a couple trigger functions like: CREATE OR REPLACE FUNCTION locations_quiet_uniqu

Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Alessandro Gagliardi
On Thu, Feb 23, 2012 at 11:26 AM, Steve Crawford < scrawf...@pinpointresearch.com> wrote: > ** > You need to rethink things a bit. Databases can fail in all sorts of ways > and can slow down during bursts of activity, data dumps, etc. You may need > to investigate some form of intermediate bufferi

Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Alessandro Gagliardi
On Thu, Feb 23, 2012 at 1:37 PM, Steve Crawford < scrawf...@pinpointresearch.com> wrote: > It's possible that you might get a nice boost by wrapping the inserts into > a transaction: > begin; > insert into...; > insert into...; > insert into...; > ... > commit; > > This only requires all that disk

Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Alessandro Gagliardi
On Thu, Feb 23, 2012 at 1:11 PM, Peter van Hardenberg wrote: > My hunch is still that your issue is lock contention. > > How would I check that? I tried looking at pg_locks but I don't know what to look for. > We have many customers who do much more than this throughput, though > I'm not sure wh

[PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
Hi folks, I have a system that racks up about 40M log lines per day. I'm able to COPY the log files into a PostgreSQL table that looks like this: CREATE TABLE activity_unlogged ( user_id character(24) NOT NULL, client_ip inet, hr_timestamp timestamp without time zone, locale character var

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
that assumption? On Thu, Mar 1, 2012 at 10:40 AM, Peter van Hardenberg wrote: > On Thu, Mar 1, 2012 at 10:27 AM, Alessandro Gagliardi > wrote: > > Now, I want to reduce that data to get the last activity that was > performed > > by each user in any given hour. It should fit i

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
though. On Thu, Mar 1, 2012 at 10:51 AM, Kevin Grittner wrote: > Alessandro Gagliardi wrote: > > > hr_timestamp timestamp without time zone, > > In addition to the responses which more directly answer your > question, I feel I should point out that this will not represen

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
Hah! Yeah, that might would work. Except that I suck at grep. :( Perhaps that's a weakness I should remedy. On Thu, Mar 1, 2012 at 10:35 AM, Craig James wrote: > On Thu, Mar 1, 2012 at 10:27 AM, Alessandro Gagliardi > wrote: > > Hi folks, > > > > I have a syste

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
Interesting solution. If I'm not mistaken, this does solve the problem of having two entries for the same user at the exact same time (which violates my pk constraint) but it does so by leaving both of them out (since there is no au1.hr_timestamp > au2.hr_timestamp in that case). Is that right? On

Re: [PERFORM] efficient data reduction (and deduping)

2012-03-01 Thread Alessandro Gagliardi
Ah, yes, that makes sense. Thank you! On Thu, Mar 1, 2012 at 11:39 AM, Claudio Freire wrote: > On Thu, Mar 1, 2012 at 4:35 PM, Alessandro Gagliardi > wrote: > > Interesting solution. If I'm not mistaken, this does solve the problem of > > having two entries for the same