On Thu, Jan 26, 2017 at 4:37 PM, Jonathan Vanasco <postg...@2xlp.com> wrote:
> There are over 20 million records in a self-referential database table, > where one record may point to another record as a descendant. > > Because of a bug in application code, there was no limit on recursion. > The max was supposed to be 4. A few outlier records have between 5 and > 5000 descendants (there could be more. I manually found one chain of 5000. > > I need to find all the chains of 5+ and mark them for update/deletion. > While the database is about 10GB, the recursive search is maxing out on > diskspace and causing a failure (there was over over 100GB of workspace > free) > > Is there any way to make a recursive query work, or will I have to use > another means and just iterate over the entire dataset (either in postgres > or an external service) > Thinking aloud - why doesn't just finding every record with 5 descendants not work? Any chain longer than 5 would have at least 5 items. Even without recursion you could build out a five-way self-join and any records that make it that far are guilty. I suppose this assumes your setup is non-cyclic. David J.