[PERFORM] transaction delays to apply
Hello All, I'm developing specialized message switching system and I've chosen to use PostgreSQL as general tool to handle transactions, store and manage all the data. This system has pretty strong timing requirements. For example, it must process not less than 10 messages per second. FYI: messages are short (aprx. 400-500 bytes length). Incoming message should be routed via special routing system to its destinations (i.e. one incoming message may be routed in dozens of channels at once). Normally this system works excellent with PostgreSQL database, the perfomance is quite impressive. BUT sometimes bad things happen (delays). For example: I have "log" table which contains all log entries for the system (warnings, errors, detailed routing info, rejections, etc). The table includes "timestamp" field and this field defaults to "now()": CREATE TABLE log ( id bigserial NOT NULL, "timestamp" timestamp without time zone NOT NULL DEFAULT now(), .. etc. So when incoming message is being processed, I do start new transaction and generate outgoing and log messages in this single transaction. Normally, viewing the log sorted by ID it comes in right timing order: ID timestamp 1 2009-08-08 00:00:00.111 2 2009-08-08 00:00:00.211 3 2009-08-08 00:01:00.311 etc. BUT it seems that rarely this transaction is being delayed to apply and log entry is being inserted in wrong order: ID timestamp 1 2009-08-08 00:00:00.111 2 2009-08-08 00:00:30.311 3 2009-08-08 00:00:00.211 Yep, that's right - sometimes for 30 seconds or even more. I do understand that there should be some delays with the database, but 30 seconds is unacceptable! Does anybody know any way to solve this? I did monitor the system running at full load (~20 messages per second) - postmaster's processes didn't eat more than 10-20% of CPU and memory. Neither did any of my application's processes. Best regards, Nick. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] transaction delays to apply
Tom Lane wrote: Nickolay writes: BUT it seems that rarely this transaction is being delayed to apply and log entry is being inserted in wrong order: ID timestamp 1 2009-08-08 00:00:00.111 2 2009-08-08 00:00:30.311 3 2009-08-08 00:00:00.211 Yep, that's right - sometimes for 30 seconds or even more. You haven't provided enough information to let anyone guess at the problem. Have you checked to see if one of the processes is blocking on a lock, or perhaps there's a sudden spike in system load, or what? Watching pg_stat_activity, pg_locks, and/or "vmstat 1" output during one of these events might help narrow down what's happening. regards, tom lane The problem is that such thing happens very rare, and NOT at full load. I can't monitor the system all the time. Is there any way to investigate the situation by any of pgsql logs or enable something like full debug? I do have a row-level lock (SELECT...FOR UPDATE) on another table during this transaction, but one row are handled by not more than 2 processes at once and it should be very quick (SELECT, parse data and UPDATE). Thank you very much for you help! Best regards, Nick. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] transaction delays to apply
Tom Lane wrote: Nickolay writes: BUT it seems that rarely this transaction is being delayed to apply and log entry is being inserted in wrong order: ID timestamp 1 2009-08-08 00:00:00.111 2 2009-08-08 00:00:30.311 3 2009-08-08 00:00:00.211 Yep, that's right - sometimes for 30 seconds or even more. You haven't provided enough information to let anyone guess at the problem. Have you checked to see if one of the processes is blocking on a lock, or perhaps there's a sudden spike in system load, or what? Watching pg_stat_activity, pg_locks, and/or "vmstat 1" output during one of these events might help narrow down what's happening. regards, tom lane Thank you, guys. Problem's solved. I'm guilty and stupid :-) One of the SELECT's in the transaction was wrong. Its job was to select messages from archive by several conditions, including: date_time::date = now()::date (i.e. timestamp field "date_time" was being converted to date type). After first run, postgresql seems to fix my mistake by cache or something else and futher SELECT's are being executed in a matter of milliseconds. Fixed the statement to: date_time >= now()::date and now everything seems to work just fine even at first run. Best regards, Nick. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Per-database warm standby?
I also have a question about warm standby replication. What'd be the best solution for the system with 2 db servers (nodes), 1 database and 10 seconds max to switch between them (ready to switch time). Currently I'm using Slony, but it's kind of slow when doing subscribe after failover on the failed node (database can be really huge and it would take a few hours to COPY tables using Slony). May be WAL replication would be better? Best regards, Nick. Tom Lane wrote: Craig James writes: 8.4 has vastly improved the warm-standby features, but it looks to me like this is still an installation-wide backup, not a per-database backup. That is, if you have (say) a couple hundred databases, and you only want warm-backup on one of them, you can't do it (except using other solutions like Slony). Is that right? Correct, and that's always going to be true of any WAL-based solution. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PG optimization question
Hi 2 all, Here is my typical configuration: 1(2) GB of RAM, HP ML 350(150) series server, SATA raid, Linux. I have 1 big table (called "archive") which contains short text messages with a plenty of additional service info. Currently this table contains more than 4M rows for a period of 4,5 months, i.e. each row has average size of 1K. I'm going to make our application work with partitions of this table instead of one large table. The primary reason is that eventually we'd need to remove old rows and it would be pretty hard with one table because of blocking (and rows are being added constantly). 1. What would be your recommendations on how to partition this table (by months, years or quarters)? 2. What is recommended PG settings for such configuration? Would it be ok to set shared_buffers to let's say 512M (if RAM is 1Gig may be shared_buffers is to be 400M?)? What other settings would you recommend? Thanks in advance, Nick. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PG optimization question
Hi 2 all, Here is my typical configuration: 1(2) GB of RAM, HP ML 350(150) series server, SATA raid, Linux. I have 1 big table (called "archive") which contains short text messages with a plenty of additional service info. Currently this table contains more than 4M rows for a period of 4,5 months, i.e. each row has average size of 1K. I'm going to make our application work with partitions of this table instead of one large table. The primary reason is that eventually we'd need to remove old rows and it would be pretty hard with one table because of blocking (and rows are being added constantly). 1. What would be your recommendations on how to partition this table (by months, years or quarters)? 2. What is recommended PG settings for such configuration? Would it be ok to set shared_buffers to let's say 512M (if RAM is 1Gig may be shared_buffers is to be 400M?)? What other settings would you recommend? Thanks in advance, Nick. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG optimization question
I do not see any way to normalize this table anymore. it's size is 4Gig for ~4M rows, i.e. 1Kb per row, i think it's ok. Also there are 2 indexes: by date_time and by a couple of service fields (total index size is 250Mb now). I think i'll be going to partition by months (approx. 1M rows or 1Gig per month), so it would be like 60 partitions for 5 years. Is that OK for postgres? Oh, btw, 95% of queries are searching rows for current date (last 24 hours). Also we use SELECT...FOR UPDATE row-level locking for updating the rows in archive (i.e. we INSERT new row when starting outgoing message transmittion and then doing SELECT...FOR UPDATE and UPDATE for source (incoming) message when outgoing message was sent), so I guess we would have to explicitly write the name of partition table (i.e. "archive_2009_12" instead of "archive") for SELECT...FOR UPDATE and UPDATE requests, as they may need to access row in previous partition instead of the current one. Grzegorz Jaśkiewicz wrote: maybe that 'one big table' needs something called 'normalisation' first. See how much that will shed off. You might be surprised. The partitioning needs to be done by some constant intervals, of time - in your case. Whatever suits you, I would suggest to use the rate that will give you both ease of archiving/removal of old data (so not too wide), and also, one that would make sure that most of the data you'll be searching for in your queries will be in one , two partitions per query. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG optimization question
That may help with the queries speed (not a problem now), but we'll then have to add UNION statement for daily staging table for other 5% of requests, right? And there would be a moment when daily message is in archive table AND in daily table (while transferring from daily table to archive). Our main problem is in blocking when doing DELETE (app sometimes freezes for a long time), and also we have to do VACUUM on live table, which is not acceptable in our app. Thanks for your reply, I was kinda worried about number of partitions and how this would affect PG query execution speed. Kenneth Marshall wrote: Oh, btw, 95% of queries are searching rows for current date (last 24 hours). You may want to use a daily staging table and then flush to the monthly archive tables at the end of the day. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG optimization question
Okay, I see your point with staging table. That's a good idea! The only problem I see here is the transfer-to-archive-table process. As you've correctly noticed, the system is kind of a real-time and there can be dozens of processes writing to the staging table, i cannot see how to make the transfer/flush process right and clear... Pierre Frédéric Caillaud wrote: Oh, btw, 95% of queries are searching rows for current date (last 24 hours). You may want to use a daily staging table and then flush to the monthly archive tables at the end of the day. If the rows in the archive tables are never updated, this strategy means you never need to vacuum the big archive tables (and indexes), which is good. Also you can insert the rows into the archive table in the order of your choice, the timestamp for example, which makes it nicely clustered, without needing to ever run CLUSTER. And with partitioning you can have lots of indexes on the staging table (and current months partition) (to speed up your most common queries which are likely to be more OLTP), while using less indexes on the older partitions (saves disk space) if queries on old partitions are likely to be reporting queries which are going to grind through a large part of the table anyway. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance