On 10/12/2013, at 20:55, Jeff Janes wrote:

> 
> On Tue, Dec 10, 2013 at 8:23 AM, Herouth Maoz <hero...@unicell.co.il> wrote:
> 
> Hello.
> 
> I have one particular table with very specialized use. I am sending messages 
> to some partner. The partner processes them asynchronously, and then returns 
> the status report to me. The table is used to store a serialized version of 
> the message object, together with a few identifiers, expiry date, and a 
> reference from the partner. When the status report comes back from the 
> partner, we:
> 
> Select the record using the partner's reference number
> reconstruct and queue the object.
> Delete the record from database using the serial primary key ID.
> 
> Where do you "queue" the object?  Back into a different table within the same 
> database?  Why not use "DELETE ...RETURNING"?

The object is used in a Java application. Basically, it goes into a queue in 
memory. Some of its data is updated in two different tables in the same 
database at a later time, but only after some logic has been performed which 
may or may not change them relative to what they were when they came from the 
partner.

I'm not using DELETE...RETURNING because basically the application that's 
running is old code, originally written around 2002 or so, before we even had 
PostgreSQL in the organization. I will probably not get permission to change 
this code without very good reason as it is a sensitive bit of code.

>  
> 
> Every once in a while we run a garbage collection process which selects 
> expired messages, reconstructs and queues them for processing, and deletes 
> the record using the primary key.
> 
> This works wonderfully as long as the table remains small - a few thousand 
> records waiting for their status report, and that's it. The table is set to 
> have frequent auto-anylize runs.
> 
> You might want to turn 

...yes?

>  
> 
> The problem starts when our partner has some glitch, under high load, and 
> fails to send back a few hundred thousand reports. In that case, the table 
> grows to a few hundred records, and they are not deleted until they hit their 
> expiry date, at which point the "garbage collector" takes care of them and 
> everything goes back to normal. When it contains hundreds of thousands of 
> records, performance deteriorates considerably.
> 
> There is no inherent reason the performance needs to degrade.  Can you give 
> the specific queries that perform worse?

They are the simplest queries possible. The performance deterioration is only 
fractions of a second, you see. But when we have thousands of status reports 
per minute, it builds up to a large delay after a few minutes.

The queries are:
select id, sm from transient where smsc_reference = ? and msisdn = ?
and then:
delete from transient where id = ?

> 
>  
> I am trying to figure out a solution that will keep the system working well 
> even when there is a large number of records in the table. At first I thought 
> of partitioning the table on the partner's reference field. But the DELETE 
> would actually slow down if I do this, right?
> 
> Not necessarily, but partitioning should be your last resort not your first 
> resort, and I don't see any reason it would be needed here.
> 
> Indexes:
>     "transient_pkey" PRIMARY KEY, btree (id)
>     "transient_msisdn_ref" btree (msisdn, smsc_reference)
>     "transient_sm_vp" btree (validity)
> 
> What is the partner reference?  If it is smsc_reference, then you probably 
> need a index in which that is the lead (or only) column.

Well, I simplified a bit, but as you can see from the query above, it queries 
on both the smsc_reference and the msisdn.


Thanks,
Herouth

Reply via email to