We've occasionally talked about allowing pending-trigger-event lists to spill to disk when there get to be so many events that it's a memory problem. I'm not especially interested in doing that right now, but I noticed recently that we could alleviate the problem a lot by adopting a more compact representation.
Currently, each event is a separately palloc'd instance of struct AfterTriggerEventData. On a 32-bit machine that struct takes 32 bytes, plus 8 bytes palloc overhead = 40 bytes. On a 64-bit machine the struct takes 36 bytes, but palloc rounds that up to 64 bytes, plus there's 16 bytes palloc overhead = 80 bytes :-(. I see several things we could do here: * Allocate the event structs in reasonably-large arrays instead of separate palloc chunks. This would require some data copying where we now get away with pointer-swinging --- but on the other hand per-event palloc'ing isn't exactly free either, so I suspect that this would net out to a wash if not an actual speedup. * Don't store the ate_tgoid and ate_relid fields in each individual event struct. Instead keep a separate array with one instance of these values for each distinct trigger that's been fired in the current transaction (in most cases this list should be pretty short, even if there are many events). We can commandeer the high order bits of ate_event to store an index into that array. Currently only 8 bits of ate_event are actually used, so we'd have room for 16 million distinct triggers fired in a transaction. Even if we need a few more ate_event flag bits later, I don't see a problem there. * Don't store two ItemPointers in insert or delete events. This would make the array element stride variable, but since we don't need random access into the arrays AFAICS, that doesn't seem to be a problem. In combination these changes would get us down to 16 bytes per insert/delete and 20 per update event, which represents a factor of 2 or 2.5 savings on a 32-bit machine and a factor of 4 or 5 on a 64-bit machine. Seems worth doing to me, especially since it looks like only about a 1-day project touching only a single source file. It might be possible to go further and move the event status bits and firing_id into the separate array, which would save a further four bytes per event in the typical situation that a lot of events of the same trigger are queued by a single command. I think I'd want to tackle that as a follow-on patch though, because it would be a change in the data structure semantics not just rearranging the representation a tad. Comments, better ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers