Hi,
I have a simple table that has indexes on 2 integer columns.
Data is inserted very often (no updates, no deletes, just inserts):
at least 4000/5000 rows per second.
The input for the 2 indexed columns is very random.
Everything is "fine" for the first 10-20M rows; after that, performance
get
> Does it help to reindex the index at that point?
Didn't try; but I guess a reindex of such a big table
would block inserts for a long time... but I'll try
> Bad. The partitioning code isn't designed to scale
> beyond a few dozen partitions.
What kind of problems am I going to experience?
It
> On a few very narrow applications I've gotten good
> performance in the
> low hundreds. After that things fall apart
> quickly.
Ehm... what exactly "fall apart quickly" means?
I can trade some "select" speed for "insert" speed...
I don't have experience with partitioning, if some of
you alr
> The usual set of tricks is to
> increase shared_buffers, checkpoint_segments, and checkpoint_timeout to cut
> down
Uh, didn't know shared_buffers played a role in index insertion as well...
got to try that. Thank you.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > The usual set of tricks is to
> increase shared_buffers,
> checkpoint_segments, and checkpoint_timeout to cut down
That did it. Setting a much higher shared_buffers helped quite a lot.
Thank you everybody for your suggestions.
--
Sent via pgsql-general mailing list (pgsql-general@pos
Hi,
increasing shared_buffers has improved *a lot* the number of inserts/second,
so my "problem" [1] is fixed.
But now I'm worried because of the sentence (Tom Lane):
"The partitioning code isn't designed to scale beyond a few dozen partitions"
Is it mainly a planning problem or an execution ti
> The thing that takes the longest is planning queries. I made THAT problem
> just
> go away for the most part by using cached queries (only works within the same
> database connection, but thats no problem for me).
What do you mean by "cached queries"? Prepared statements?
--
Sent via
I'm sorry I have to come back at this, but the solution the list gave helped,
but didn't fully solved my problems...
To sum up:
I have a simple table that has indexes on 2 integer columns.
Data is inserted very often (no updates, no deletes, just inserts): at
least 4000/5000 rows per second. The
Hi all,
we're going to deploy a web app that manages users/roles for another
application.
We want the database to be "safe" from changes made by malicious
users.
I guess our options are:
1) have the db listen only on local connections; basically when the
machine is accessed the db could be "co
> Personally I would lean toward making
> the bulk of security within the
> application so to simplify everything - the
> database would do what it
> does best - store and manipulate data - and the
> application would be the
> single point of entry. Protect the servers - keep
> the applications
>I think this point number 2 is pretty important. If at all possible, keep
> the webapp separate from the database, and keep the database
> server on a fairly restrictive firewall. This means that someone has
> got to get in to the webapp, then hop to the database server, it just
> adds another
At this page:
http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY
I read:
"The automatic hierarchy loading code is currently integrated
in the code of the COPY command of Postgres 8.5"
Is that true?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes
HI all,
I have a very big table (2000 inserts per sec, I have to store 20 days of data).
The table has 2 indexes, in columns that have almost-random values.
Since keeping those two indexes up-to-date can't be done (updating 2000
times per second 2 indexes with random values on such a huge table
> Could you please explain the reason to do so many
> partitions?
Because otherwise there would be tons of rows in each
partition, and randomly "updating" the index for that many
rows 2000 times per second isn't doable (the indexes
get so big that it would be like writing a multi-GB file
random
> Well the situation is still ambiguous
> so:
> Is it possible to provide this table and indexes definitions?
> And it
> would be great it you describe the queries you are going to do
> on this table
> or just provide the SQL.
Sure!
Basically what I'm trying to do is to partition the index in
> For "inserts" I do not see the reason
> why
> it would be better to use index partitioning because AFAIK
> b-tree
> would behave exactly the same in both cases.
no, when the index gets very big inserting random values gets
very slow.
But still, my approach doesn't work because I thought Postg
> AFAIU the OP is trying to give the cache a chance of
> doing some useful
> work by partitioning by time so it's going to be forced to
> go to disk
> less.
Exactly
> have you
> considered a couple of
> "levels" to your hierarchy. Maybe bi-hourly (~15
> million records?)
> within the current
> thanks very much for your
> help.
> It gave me a good idea of what to do. If you have further
> recommendations, I
> would be glad to here them.
I guess you should give more info about the expected
workload of your server(s)... otherwise you'll risk spend
too much money/spend your money in a
>I'm trying to make a query that, given N and a date, gives me the interval of
>N hours with the max(sum(...)).
select sum(i) as s, timestamp '2010-06-16 00:00:00' + extract(hour from
d)::integer/3*3 * interval '1 hour' as sumd from p group by extract(hour from
d)::integer/3 where d = '2010-06
> The docs are mute on this.
Not true. Read the NOTES section of
http://www.postgresql.org/docs/8.4/static/sql-cluster.html :
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
> VACUUM ANALYZE;
> CLUSTER;
> REINDEX DATABASE "database";
ANALYZE has to go after CLUSTER; and CLUSTER already
vacuums the tables (I'm not 100% sure though). CLUSTER also
reindexes the whole table, so there's no need for another REINDEX.
I think the right way of doing it would be:
CLUSTER;
ANA
21 matches
Mail list logo