On Mon, Apr 18, 2016 at 7:48 AM, Teodor Sigaev <teo...@sigaev.ru> wrote: >>> Added, see attached patch (based on v3.1) >> >> >> With this applied, I am getting a couple errors I have not seen before >> after extensive crash recovery testing: >> ERROR: attempted to delete invisible tuple >> ERROR: unexpected chunk number 1 (expected 2) for toast value >> 100338365 in pg_toast_16425 > > Huh, seems, it's not related to GIN at all... Indexes don't play with toast > machinery. The single place where this error can occur is a heap_delete() - > deleting already deleted tuple.
Those are two independent errors. The delete invisible tuple error doesn't occur on toast tables. The actual statement triggering the error is an update statement. Since it is showing up in the delete path, I assume it must be an update where the new tuple goes to a different page. But, if the soon-to-be-old tuple is not visible, why is the update trying to update it in the first place? It seems like the different parts of the code disagree on what is visible. update foo set count=count+1,text_array=$1 where text_array @> $2 I agree it might not have anything to do with gin indexes, but I didn't see it in testing anything else. It might be a wrap-around problem which for some reason only the gin test is efficient at evoking. What I've done now is apply your v4 patch directly to e95680832854cf300e64c1 and I am trying to see if it also has the problem. If that is clean, then it is probably an independently introduced bug which is just getting exercised by the gin index stress test. If that is the case I'll try to git bisect forward, but that could take weeks given the runtimes involved. If that is dirty, then maybe the FSM vacuuming patch introduced/uncovered more than one bug, and should be reverted. >> I've restarted the test harness with intentional crashes turned off, >> to see if the problems are related to crash recovery or are more >> generic than that. I do not see the problem when there is no crash-recovery cycling involved. I also do not see the problem when compiled under --enable-cassert, but that could just be because compiling that way makes it too slow to get in sufficient testing to hit the bug; before I gave up. >> >> I've never seen these particular problems before, so don't have much >> insight into what might be going on or how to debug it. > > Check my reasoning: In version 4 I added a remebering of tail of pending > list into blknoFinish variable. And when we read page which was a tail on > cleanup start then we sets cleanupFinish variable and after cleaning that > page we will stop further cleanup. Any insert caused during cleanup will be > placed after blknoFinish (corner case: in that page), so, vacuum should not > miss tuples marked as deleted. Yes, I agree with the correctness of v4. But I do wonder if we should use that early stopping for vacuum and gin_clean_pending_list, rather than just using it for user backends. While I think correctness allows it to stop early, since these routines are explicitly about cleaning things up it seems like they should volunteer to clean the whole thing. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers