Thanks David, I've been thinking hard on this one, and I tried to keep the details of the application requirements to a minimum in my question, to focus on the postgres performance aspect. Though it may be off topic, let me try to describe the reason I'm trying not to delete rows. Originally, my idea was exactly what you've described: to move data to an archive table, but here is the problem:
The rows contain metadata for subgraphs (partitions) of a huge graph structure. Each partition has its local coordinate system for the nodes, and a parallel processing framework goes through partitions looking for patterns. When patterns are found, the matching coordinates need to be converted to global coordinates, and partition metadata allows that. However, partition metadata is build via calculations based on the previous partitions' metadata. If I delete a single row, or update it, it means that the graph structure has changed, and I have to update metadata for all partitions that follow the deleted/updated one, which means updating lots of rows in db. If I leave the dead/disabled partition in place, and simply add a copy of it as if it is a new partition, then all is fine. All I have to do is not to use the dead/disabled partition during graph pattern search. I am not happy about having enabled/disabled status leaking into other layers, and a view may be a good solution; thanks for that!. I am not happy about hiding dirt under the carpet rather than putting it in the rubbish bin, but the performance penalty is simply too big. Best regards Seref On Thu, Aug 30, 2012 at 8:09 PM, David Johnston <pol...@yahoo.com> wrote: > From: pgsql-general-ow...@postgresql.org > [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Seref Arikan > Sent: Thursday, August 30, 2012 12:48 PM > To: PG-General Mailing List > Subject: [GENERAL] Performance implications of adding a "disabled" column > to > a table > > Greetings, > I have a large number of rows (up to 3-4 millions) that I'll either be > fetching into ram (if it is a few thousand rows), or scrolling through a > cursor. > Deletions or updates on content of these rows lead to expensive operations > in my business logic, so I simply need to mark them as disabled. > I have two options here: to exclude the disabled rows from my query results > via ..where not disabled or to not to process disabled rows in business > logic. > There will be very few disabled rows compared to total rows I'll have to > return. This table will have very few queries, so including disabled = > false > or enabled = true in every query would not be an issue > > I have a feeling that including this criteria in the queries may add > unnecessary processing to my queries, since the table is expected to get up > to 100 million or more rows. So should I deal with this in DB, or at the > application layer? > > Kind regards > Seref > > > ============================================================================ > = > > So you are willing to live with the fact that whatever, apparently > important, business logic processing occurs when a row is updated or > deleted > is in-fact not important since now you are considering a class of data that > is not useful but still present and reflected elsewhere in the system. > > If the proportion of rows that are disabled is small, and you don't intend > to query "where disabled = TRUE", then adding the column is sufficient. No > index required since it will not be used. Whether you alter queries to add > the "disabled = FALSE" or read and deal with it in the application doesn't > really matter (unless individual records are large). Personally I would > suggest creating VIEWs that do not output "disabled" and that always use > "disabled = FALSE". Call it something like "tablename_active" and then use > that view thus only sending enabled records to the application layer. > > I would also make sure I am very confident on the implications of doing > this > versus deleting the data outright (though possibly moving it to an > "archive" > table) > > David J. > > >