[PERFORM] transaction delays to apply

2009-08-12 Thread Nickolay

Hello All,

I'm developing specialized message switching system and I've chosen to 
use PostgreSQL as general tool to handle transactions, store and manage 
all the data.
This system has pretty strong timing requirements. For example, it must 
process not less than 10 messages per second. FYI: messages are short 
(aprx. 400-500 bytes length). Incoming message should be routed via 
special routing system to its destinations (i.e. one incoming message 
may be routed in dozens of channels at once).
Normally this system works excellent with PostgreSQL database, the 
perfomance is quite impressive.

BUT sometimes bad things happen (delays). For example:
I have "log" table which contains all log entries for the system 
(warnings, errors, detailed routing info, rejections, etc).

The table includes "timestamp" field and this field defaults to "now()":
CREATE TABLE log
(
 id bigserial NOT NULL,
 "timestamp" timestamp without time zone NOT NULL DEFAULT now(),
.. etc.
So when incoming message is being processed, I do start new transaction 
and generate outgoing and log messages in this single transaction.

Normally, viewing the log sorted by ID it comes in right timing order:
ID   timestamp
1 2009-08-08 00:00:00.111
2 2009-08-08 00:00:00.211
3 2009-08-08 00:01:00.311
etc.
BUT it seems that rarely this transaction is being delayed to apply and 
log entry is being inserted in wrong order:

ID   timestamp
1 2009-08-08 00:00:00.111
2 2009-08-08 00:00:30.311
3 2009-08-08 00:00:00.211
Yep, that's right - sometimes for 30 seconds or even more.
I do understand that there should be some delays with the database, but 
30 seconds is unacceptable!
Does anybody know any way to solve this? I did monitor the system 
running at full load (~20 messages per second) - postmaster's processes 
didn't eat more than 10-20% of CPU and memory. Neither did any of my 
application's processes.


Best regards, Nick.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] transaction delays to apply

2009-08-13 Thread Nickolay

Tom Lane wrote:

Nickolay  writes:
  
BUT it seems that rarely this transaction is being delayed to apply and 
log entry is being inserted in wrong order:

ID   timestamp
1 2009-08-08 00:00:00.111
2 2009-08-08 00:00:30.311
3 2009-08-08 00:00:00.211
Yep, that's right - sometimes for 30 seconds or even more.



You haven't provided enough information to let anyone guess at the
problem.  Have you checked to see if one of the processes is blocking
on a lock, or perhaps there's a sudden spike in system load, or what?
Watching pg_stat_activity, pg_locks, and/or "vmstat 1" output during
one of these events might help narrow down what's happening.

regards, tom lane

  


The problem is that such thing happens very rare, and NOT at full load. 
I can't monitor the system all the time. Is there any way to investigate 
the situation by any of pgsql logs or enable something like full debug? 
I do have a row-level lock (SELECT...FOR UPDATE) on another table during 
this transaction, but one row are handled by not more than 2 processes 
at once and it should be very quick (SELECT, parse data and UPDATE).

Thank you very much for you help!

Best regards, Nick.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] transaction delays to apply

2009-08-13 Thread Nickolay

Tom Lane wrote:

Nickolay  writes:
  
BUT it seems that rarely this transaction is being delayed to apply and 
log entry is being inserted in wrong order:

ID   timestamp
1 2009-08-08 00:00:00.111
2 2009-08-08 00:00:30.311
3 2009-08-08 00:00:00.211
Yep, that's right - sometimes for 30 seconds or even more.



You haven't provided enough information to let anyone guess at the
problem.  Have you checked to see if one of the processes is blocking
on a lock, or perhaps there's a sudden spike in system load, or what?
Watching pg_stat_activity, pg_locks, and/or "vmstat 1" output during
one of these events might help narrow down what's happening.

regards, tom lane

  

Thank you, guys. Problem's solved. I'm guilty and stupid :-)
One of the SELECT's in the transaction was wrong. Its job was to select 
messages from archive by several conditions, including:

date_time::date = now()::date
(i.e. timestamp field "date_time" was being converted to date type). 
After first run, postgresql seems to fix my mistake by cache or 
something else and futher SELECT's are being executed in a matter of 
milliseconds.

Fixed the statement to:
date_time >= now()::date
and now everything seems to work just fine even at first run.

Best regards, Nick.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Per-database warm standby?

2009-08-15 Thread Nickolay

I also have a question about warm standby replication.
What'd be the best solution for the system with 2 db servers (nodes), 1 
database and 10 seconds max to switch between them (ready to switch time).
Currently I'm using Slony, but it's kind of slow when doing subscribe 
after failover on the failed node (database can be really huge and it 
would take a few hours to COPY tables using Slony).

May be WAL replication would be better?

Best regards, Nick.

Tom Lane wrote:

Craig James  writes:
  

8.4 has vastly improved the warm-standby features, but it looks to me like this 
is still an installation-wide backup, not a per-database backup.  That is, if 
you have (say) a couple hundred databases, and you only want warm-backup on one 
of them, you can't do it (except using other solutions like Slony).  Is that 
right?



Correct, and that's always going to be true of any WAL-based solution.

regards, tom lane
  



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PG optimization question

2010-01-09 Thread Nickolay

Hi 2 all,

Here is my typical configuration: 1(2) GB of RAM, HP ML 350(150) series 
server, SATA raid, Linux.


I have 1 big table (called "archive") which contains short text messages 
with a plenty of additional service info.
Currently this table contains more than 4M rows for a period of 4,5 
months, i.e. each row has average size of 1K.


I'm going to make our application work with partitions of this table 
instead of one large table. The primary reason is that eventually we'd 
need to remove old rows and it would be pretty hard with one table 
because of blocking (and rows are being added constantly).


1. What would be your recommendations on how to partition this table (by 
months, years or quarters)?
2. What is recommended PG settings for such configuration? Would it be 
ok to set shared_buffers to let's say 512M (if RAM is 1Gig may be 
shared_buffers is to be 400M?)? What other settings would you recommend?


Thanks in advance,
Nick.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PG optimization question

2010-01-09 Thread Nickolay

Hi 2 all,

Here is my typical configuration: 1(2) GB of RAM, HP ML 350(150) series 
server, SATA raid, Linux.


I have 1 big table (called "archive") which contains short text messages 
with a plenty of additional service info.
Currently this table contains more than 4M rows for a period of 4,5 
months, i.e. each row has average size of 1K.


I'm going to make our application work with partitions of this table 
instead of one large table. The primary reason is that eventually we'd 
need to remove old rows and it would be pretty hard with one table 
because of blocking (and rows are being added constantly).


1. What would be your recommendations on how to partition this table (by 
months, years or quarters)?
2. What is recommended PG settings for such configuration? Would it be 
ok to set shared_buffers to let's say 512M (if RAM is 1Gig may be 
shared_buffers is to be 400M?)? What other settings would you recommend?


Thanks in advance,
Nick.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Nickolay
I do not see any way to normalize this table anymore. it's size is 4Gig 
for ~4M rows, i.e. 1Kb per row, i think it's ok.
Also there are 2 indexes: by date_time and by a couple of service fields 
(total index size is 250Mb now).
I think i'll be going to partition by months (approx. 1M rows or 1Gig 
per month), so it would be like 60 partitions for 5 years. Is that OK 
for postgres?

Oh, btw, 95% of queries are searching rows for current date (last 24 hours).
Also we use SELECT...FOR UPDATE row-level locking for updating the rows 
in archive (i.e. we INSERT new row when starting outgoing message 
transmittion and then doing SELECT...FOR UPDATE and UPDATE for source 
(incoming) message when outgoing message was sent), so I guess we would 
have to explicitly write the name of partition table (i.e. 
"archive_2009_12" instead of "archive") for SELECT...FOR UPDATE and 
UPDATE requests, as they may need to access row in previous partition 
instead of the current one.


Grzegorz Jaśkiewicz wrote:

maybe that 'one big table' needs something called 'normalisation'
first. See how much that will shed off. You might be surprised.
The partitioning needs to be done by some constant intervals, of time
- in your case. Whatever suits you, I would suggest to use the rate
that will give you both ease of archiving/removal of old data (so not
too wide), and also, one that would make sure that most of the data
you'll be searching for in your queries will be in one , two
partitions per query.


  


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Nickolay
That may help with the queries speed (not a problem now), but we'll then 
have to add UNION statement for daily staging table for other 5% of 
requests, right? And there would be a moment when daily message is in 
archive table AND in daily table (while transferring from daily table to 
archive).
Our main problem is in blocking when doing DELETE (app sometimes freezes 
for a long time), and also we have to do VACUUM on live table, which is 
not acceptable in our app.


Thanks for your reply, I was kinda worried about number of partitions 
and how this would affect PG query execution speed.


Kenneth Marshall wrote:
Oh, btw, 95% of queries are searching rows for current date (last 24 
hours).



You may want to use a daily staging table and then flush to the 
monthly archive tables at the end of the day.


  


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Nickolay

Okay, I see your point with staging table. That's a good idea!
The only problem I see here is the transfer-to-archive-table process. As 
you've correctly noticed, the system is kind of a real-time and there 
can be dozens of processes writing to the staging table, i cannot see 
how to make the transfer/flush process right and clear...


Pierre Frédéric Caillaud wrote:
Oh, btw, 95% of queries are searching rows for current date (last 
24 hours).




You may want to use a daily staging table and then flush to the 
monthly archive tables at the end of the day.


If the rows in the archive tables are never updated, this strategy 
means you never need to vacuum the big archive tables (and indexes), 
which is good. Also you can insert the rows into the archive table in 
the order of your choice, the timestamp for example, which makes it 
nicely clustered, without needing to ever run CLUSTER.


And with partitioning you can have lots of indexes on the staging 
table (and current months partition) (to speed up your most common 
queries which are likely to be more OLTP), while using less indexes on 
the older partitions (saves disk space) if queries on old partitions 
are likely to be reporting queries which are going to grind through a 
large part of the table anyway.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance