On Mon, Feb 17, 2014 at 8:45 AM, Herouth Maoz <hero...@unicell.co.il> wrote:
> I have a production system using Postgresql 9.1.2. > > The system basically receives messages, puts them in a queue, and then > several parallel modules, each in its own thread, read from that queue, and > perform two inserts, then release the message to the next queue for > non-database-related processing. > > Today, after we received complaints from a customer about delays, I > noticed odd behavior in the system. About 2 percent of the messages were > inserted into the tables more than an hour after they got into the system. > How do you know that? > > The queue never has more than 27,000 messages at the same time, and all > together, the parallel modules process about 5000 or 6000 messages per > minute. So basically, the delay for a single message should never be more > than a few minutes. Even if one module gets stuck, another will grab the > next message from the queue. I believe the only way for a message to be > stuck for so long would be for it to be grabbed by a module, and then for > the database write to be somehow delayed for a hour, although it's merely a > simple insert performed with a prepared statement. > > The database in production is very busy with millions of writes per hour. > Could there be a situation in which a particular connection gets "starved" > while other connections are able to run queries without noticeable delay? > If there is a delay like that, it would almost certainly be due to database locks that show up in pg_locks. http://www.postgresql.org/docs/current/static/view-pg-locks.html http://wiki.postgresql.org/wiki/Lock_Monitoring http://wiki.postgresql.org/wiki/Lock_dependency_information But, I doubt that that is your problem. > > How can I truck such locks down? Does anybody have any ideas other than > starvation? The system lets me view statistics of how many messages were > processed in each modules and the average latency. None of the four modules > running has long average latency or low number of messages processes, so I > don't think the issue is related to any particular thread in my (Java) > system being slow or locked away by the others. > If the insert into PostgreSQL was freezing, wouldn't that time get reflected in your latency monitoring? It sounds to me like your application has a bug in its queue, where it forgets about items on the queue for a while. Cheers, Jeff