Did you try using NOWAIT instead of SKIP LOCKED to see if the behavior still shows up?
On Tue, Aug 18, 2020, 8:22 PM Jim Jarvie <j...@talentstack.to> wrote: > Thank you for the quick response. > > No adjustments of fill factors. Hadn't though of that - I'll investigate > and try some options to see if I can measure an effect. > > There is some ordering on the select [ ORDER BY q_id] so each block of 250 > is sequential-ish queue items; I just need them more or less in the order > they were queued so as near FIFO as possible without being totally strict > on absolute sequential order. > > Table has around 192K rows, as a row is processed it is deleted as part of > the transaction with a commit at the end after all 250 are processed > [partitioned table, state changes and it migrates to a different partition] > and as the queue drops to 64K it is added to with 128K rows at a time. > > I've tuned the LIMIT value both up and down. As I move the limit up, the > problem becomes substantially worse; 300 swamps it and the selects take > 1 > hour to complete; at 600 they just all lock everything up and it stops > processing. I did try 1,000 but it basically resulted in nothing being > processed. > Less processes does not give the throughput required because the queue > sends data elsewhere which has a long round trip time but does permit over > 1K concurrent connections as their work-round for throughput. I'm stuck > having to scale up my concurrent processes in order to compensate for the > long processing time of an individual queue item. > > > > On 18-Aug.-2020 20:08, Michael Lewis wrote: > > Message queue... > Are rows deleted? Are they updated once or many times? Have you adjusted > fillfactor on table or indexes? How many rows in the table currently or on > average? Is there any ordering to which rows you update? > > It seems likely that one of the experts/code contributors will chime in and > explain about how locking that many rows in that many concurrent > connections means that some resource is overrun and so you are escalating > to a table lock instead of actually truly locking only the 250 rows you > wanted. > > On the other hand, you say 80 cores and you are trying to increase the > number of concurrent processes well beyond that without (much) disk I/O > being involved. I wouldn't expect that to perform awesome. > > Is there a chance to modify the code to permit each process to lock 1000 > rows at a time and be content with 64 concurrent processes? > > >