Re: [GENERAL] Strategies/Best Practises Handling Large Tables

2012-10-21 Thread Chitra Creta
Thank you all.

Ryan, would you mind sharing your one-time function to move it?

Merlin, what are your suggestions to improve query performance?

Shaun, thank you. I will look into facts and dimensions should all else
fail.

Chris, would you mind giving me an example of what you mean by your log,
aggregate and snapshot approach. Also, with indexing, I believe composite
and partial indexes are better than indexes, am I correct? Do you have any
recommendations as to which type (e.g btree, hash) is better for which
situations.

Thank you,
Chitra



On Thu, Oct 18, 2012 at 12:47 AM, Chris Travers wrote:

>
>
> On Fri, Oct 12, 2012 at 7:44 AM, Chitra Creta wrote:
>
>> Hi,
>>
>> I currently have a table that is growing very quickly - i.e 7 million
>> records in 5 days. This table acts as a placeholder for statistics, and
>> hence the records are merely inserted and never updated or deleted.
>>
>> Many queries are run on this table to obtain trend analysis. However,
>> these queries are now starting to take a very long time (hours) to execute
>> due to the size of the table.
>>
>> I have put indexes on this table, to no significant benefit.  Some of the
>> other strategies I have thought of:
>> 1. Purge old data
>> 2. Reindex
>> 3. Partition
>> 4. Creation of daily, monthly, yearly summary tables that contains
>> aggregated data specific to the statistics required
>>
>> Does anyone know what is the best practice to handle this situation?
>>
>
> The answer is well, it depends.  Possibly some combination.
>
> One approach I like that may be included in #4 but not necessarily is the
> idea of summary tables which contain snapshots of the data, allowing you to
> roll forward or backward from defined points.  This is what I call the log,
> aggregate, and snapshot approach.   But it really depends on what you are
> doing and there is no one size fits all approach at this volume.
>
> Instead of reindexing, I would suggest also looking into partial indexes.
>
> Best Wishes,
>


Re: [GENERAL] Strategies/Best Practises Handling Large Tables

2012-10-21 Thread Chris Travers
On Sun, Oct 21, 2012 at 5:46 AM, Chitra Creta  wrote:

> Thank you all.
>
> Ryan, would you mind sharing your one-time function to move it?
>
> Merlin, what are your suggestions to improve query performance?
>
> Shaun, thank you. I will look into facts and dimensions should all else
> fail.
>
> Chris, would you mind giving me an example of what you mean by your log,
> aggregate and snapshot approach. Also, with indexing, I believe composite
> and partial indexes are better than indexes, am I correct? Do you have any
> recommendations as to which type (e.g btree, hash) is better for which
> situations.
>

Sure.  Suppose I have an accounting system.

I may record the amounts in the transactions in a journal_entry and
journal_line table.   These will be write once read many.  However time you
will end up having to digest millions of records (given sufficient volume)
to find out the balance of a checking account, and this is not really ideal.

So to deal with this, I might, for example, add a table called
account_checkpoint which might have the following fields:

account_id
end_date
debits
credits
balance

And then I can snapshot on closing of books the accumulated debits,
credits, and balance to date.  If I need any of these numbers I can just
grab the appropriate number from account_checkpoint and roll forward from
end_date.  If I have too much volume I can have closings on a monthly level
of whatever.

The checkpoint table contains sufficient information for me to start a
report at any point and end it at any other point without having to scan
interceding checkpointed periods.  Additionally if I want to purge old
data, I can do so without losing current balance information.

So what this approach does, in essence is it gives you a way to purge
without losing some aggregated information, and a way to skip portions of
the table for aggregation purposes you can't skip otherwise.  The big thing
is you cannot insert (and if this is in doubt, you need to enforce this
with a trigger) any records effective before the most recent checkpoint.

Best Wishes,
Chris Travers


[GENERAL] variadic function, query "in", help with syntax/function

2012-10-21 Thread Scott Ribe
Briefly, what would it take to make the following work?

create function getbatch (variadic ids int8[]) returns setof foobar as $$
begin
return query
select * from foobar where id in (ids);
end; 
$$ language plpgsql;

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] variadic function, query "in", help with syntax/function

2012-10-21 Thread Pavel Stehule
Hello

2012/10/21 Scott Ribe :
> Briefly, what would it take to make the following work?
>
> create function getbatch (variadic ids int8[]) returns setof foobar as $$
> begin
> return query
> select * from foobar where id in (ids);
> end;
> $$ language plpgsql;
>

create function getbatch (variadic ids int8[]) returns setof foobar as $$
begin
return query
select * from foobar where id = any (ids);
end;
$$ language plpgsql;

note, for these single statement function, sql language is better

regards

Pavel Stehule


> --
> Scott Ribe
> scott_r...@elevated-dev.com
> http://www.elevated-dev.com/
> (303) 722-0567 voice
>
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] variadic function, query "in", help with syntax/function

2012-10-21 Thread Scott Ribe
On Oct 21, 2012, at 11:01 AM, Pavel Stehule wrote:

> Hello
> 
> 2012/10/21 Scott Ribe :
>> Briefly, what would it take to make the following work?
>> 
>> create function getbatch (variadic ids int8[]) returns setof foobar as $$
>> begin
>>return query
>>select * from foobar where id in (ids);
>> end;
>> $$ language plpgsql;
>> 
> 
> create function getbatch (variadic ids int8[]) returns setof foobar as $$
> begin
>return query
>select * from foobar where id = any (ids);
> end;
> $$ language plpgsql;

Ah, thanks, I didn't know any worked in that context--I've just used it (and 
only seen examples) the other way: const_id = any(somecol).

> note, for these single statement function, sql language is better

I greatly simplified it for the question; it's kind of a nasty function that's 
multiple unions of multiple joins.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general