Hi,

> testing with master as of cf366e97ff, sqlsmith occasionally triggers the
> following assertion:
>
> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
> (&(bufHdr)->content_lock))))", File: "bufmgr.c", Line: 3397)
>
> Backtraces always look like the one below.  It is reproducible on a
> cluster once it happens.  I could provide a tarball if needed.
>
> regards,
> Andreas
>
> #2  0x00000000008324b1 in ExceptionalCondition 
> (conditionName=conditionName@entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*) 
> (&(bufHdr)->content_lock))))", errorType=errorType@entry=0x87b03d 
> "FailedAssertion", fileName=fileName@entry=0x9e5856 "bufmgr.c", 
> lineNumber=lineNumber@entry=3397) at assert.c:54
> #3  0x0000000000706971 in MarkBufferDirtyHint (buffer=2844, 
> buffer_std=buffer_std@entry=1 '\001') at bufmgr.c:3397
> #4  0x00000000004b3ecd in _hash_kill_items (scan=scan@entry=0x66dcf70) at 
> hashutil.c:514
> #5  0x00000000004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512
> #6  0x00000000004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353
> #7  0x000000000061fa51 in ExecEndIndexScan (node=0x3093f30) at 
> nodeIndexscan.c:852
> #8  0x0000000000608e59 in ExecEndNode (node=<optimized out>) at 
> execProcnode.c:715
> #9  0x00000000006045b8 in ExecEndPlan (estate=0x3064000, planstate=<optimized 
> out>) at execMain.c:1540
> #10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487
> #11 0x00000000005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302
> #12 0x000000000085cbb3 in PortalDrop (portal=0x1a60060, 
> isTopCommit=<optimized out>) at portalmem.c:489
> #13 0x0000000000736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at 
> postgres.c:1111
> #14 0x0000000000738b51 in PostgresMain (argc=<optimized out>, 
> argv=argv@entry=0x1a6c6c8, dbname=<optimized out>, username=<optimized out>) 
> at postgres.c:4071
> #15 0x0000000000475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317
> #16 BackendStartup (port=0x1a65b90) at postmaster.c:3989
> #17 ServerLoop () at postmaster.c:1729
> #18 0x00000000006c8662 in PostmasterMain (argc=argc@entry=4, 
> argv=argv@entry=0x1a3f540) at postmaster.c:1337
> #19 0x000000000047729d in main (argc=4, argv=0x1a3f540) at main.c:228
>


Hi,

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash. However, I have found
the reason for this crash. This is basically happening when trying to
retrieve the tuples using cursor. Basically the current hash index
scan work tuple-at-a-time which means once it finds tuple on page, it
releases lock from the page but keeps pin on it and finally returns
the tuple. When the requested number of tuples are processed there is
no lock on the page that was being scanned but yes there is a pin on
it. Finally, when trying to close a cursor at the end of scan, if any
killed tuples has been identified we try to first mark these items as
dead with the help of _hash_kill_items(). But, since we only have pin
on this page, the assert check 'LWLockHeldByMe()' fails.

When scanning tuples using normal SELECT * statement, before moving to
next page in a bucket we first deal with all the killed items but we
do this without releasing lock and pin on the current page. Hence,
with SELECT queries this crash is not visible.

The attached patch fixes this. But, please note that all these changes
will get removed with the patch for page scan mode - [1].

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmobYTvexcjqMhXoNCyEUHChzmdC_2xVGgj7eqaYVgoJA%2Bg%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..0bacef8 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -478,7 +478,7 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
@@ -509,7 +509,7 @@ hashendscan(IndexScanDesc scan)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 2d92049..414cc6a 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -467,7 +467,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 					/* Before leaving current page, deal with any killed items */
 					if (so->numKilled > 0)
-						_hash_kill_items(scan);
+						_hash_kill_items(scan, true);
 
 					/*
 					 * ran off the end of this page, try the next
@@ -524,7 +524,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 					/* Before leaving current page, deal with any killed items */
 					if (so->numKilled > 0)
-						_hash_kill_items(scan);
+						_hash_kill_items(scan, true);
 
 					/*
 					 * ran off the end of this page, try the next
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 2e99719..f24bc4c 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -456,11 +456,15 @@ _hash_get_newbucket_from_oldbucket(Relation rel, Bucket old_bucket,
  * current page and killed tuples thereon (generally, this should only be
  * called if so->numKilled > 0).
  *
+ * The caller must have pin on so->hashso_curbuf, but may or may not have
+ * read-lock, as indicated by haveLock.  Note that we assume read-lock
+ * is sufficient for setting LP_DEAD hint bits.
+ *
  * We match items by heap TID before assuming they are the right ones to
  * delete.
  */
 void
-_hash_kill_items(IndexScanDesc scan)
+_hash_kill_items(IndexScanDesc scan, bool haveLock)
 {
 	HashScanOpaque	so = (HashScanOpaque) scan->opaque;
 	Page	page;
@@ -472,6 +476,10 @@ _hash_kill_items(IndexScanDesc scan)
 
 	Assert(so->numKilled > 0);
 	Assert(so->killedItems != NULL);
+	Assert(BufferIsValid(so->hashso_curbuf));
+
+	if (!haveLock)
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 
 	/*
 	 * Always reset the scan state, so we don't look for same
@@ -513,4 +521,7 @@ _hash_kill_items(IndexScanDesc scan)
 		opaque->hasho_flag |= LH_PAGE_HAS_DEAD_TUPLES;
 		MarkBufferDirtyHint(so->hashso_curbuf, true);
 	}
+
+	if (!haveLock)
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
 }
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index eb1df57..89fc319 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -393,7 +393,7 @@ extern BlockNumber _hash_get_oldblock_from_newbucket(Relation rel, Bucket new_bu
 extern BlockNumber _hash_get_newblock_from_oldbucket(Relation rel, Bucket old_bucket);
 extern Bucket _hash_get_newbucket_from_oldbucket(Relation rel, Bucket old_bucket,
 								   uint32 lowmask, uint32 maxbucket);
-extern void _hash_kill_items(IndexScanDesc scan);
+extern void _hash_kill_items(IndexScanDesc scan, bool haveLock);
 
 /* hash.c */
 extern void hashbucketcleanup(Relation rel, Bucket cur_bucket,
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to