d-c-manning commented on code in PR #7063:
URL: https://github.com/apache/hbase/pull/7063#discussion_r2136225740


##########
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestRestoreSnapshotHelper.java:
##########
@@ -218,6 +230,104 @@ public void testCopyExpiredSnapshotForScanner() throws 
IOException, InterruptedE
       .copySnapshotForScanner(conf, fs, rootDir, restoreDir, snapshotName));
   }
 
+  /**
+   * Test scenario for HBASE-29346, which addresses the issue where restoring 
snapshots after region
+   * merge operations could lead to missing store file references, potentially 
resulting in data
+   * loss.
+   * <p>
+   * This test performs the following steps:
+   * </p>
+   * <ol>
+   * <li>Creates a table with multiple regions.</li>
+   * <li>Inserts data into each region and flushes to create store files.</li>
+   * <li>Takes snapshot of the table and performs restore.</li>
+   * <li>Disable compactions, merge regions, create a new snapshot, and 
restore that snapshot on the
+   * same restore path.</li>
+   * <li>Verifies data integrity by creating another snapshot.</li>
+   * </ol>
+   */
+  @Test
+  public void testMultiSnapshotRestoreWithMerge() throws IOException, 
InterruptedException {
+    rootDir = TEST_UTIL.getDefaultRootDirPath();
+    CommonFSUtils.setRootDir(conf, rootDir);
+    TableName tableName = 
TableName.valueOf("testMultiSnapshotRestoreWithMerge");
+    Path restoreDir = new Path("/hbase/.tmp-snapshot/restore-snapshot-dest");
+
+    byte[] columnFamily = Bytes.toBytes("A");
+    Table table = TEST_UTIL.createTable(tableName, new byte[][] { columnFamily 
},
+      new byte[][] { new byte[] { 'b' }, new byte[] { 'd' } });
+    Put put1 = new Put(Bytes.toBytes("a")); // Region 1: [-∞, b)
+    put1.addColumn(columnFamily, Bytes.toBytes("q"), Bytes.toBytes("val1"));
+    table.put(put1);
+    Put put2 = new Put(Bytes.toBytes("b")); // Region 2: [b, d)
+    put2.addColumn(columnFamily, Bytes.toBytes("q"), Bytes.toBytes("val2"));
+    table.put(put2);
+    Put put3 = new Put(Bytes.toBytes("d")); // Region 3: [d, +∞)
+    put3.addColumn(columnFamily, Bytes.toBytes("q"), Bytes.toBytes("val3"));
+    table.put(put3);
+
+    TEST_UTIL.getAdmin().flush(tableName);
+
+    String snapshotOne = tableName.getNameAsString() + "-snapshot-one";
+    createAndAssertSnapshot(tableName, snapshotOne);
+    RestoreSnapshotHelper.copySnapshotForScanner(conf, fs, rootDir, 
restoreDir, snapshotOne);
+    flipComactions(false);
+    mergeRegions(tableName, 2);
+    String snapshotTwo = tableName.getNameAsString() + "-snapshot-two";
+    createAndAssertSnapshot(tableName, snapshotTwo);
+    RestoreSnapshotHelper.copySnapshotForScanner(conf, fs, rootDir, 
restoreDir, snapshotTwo);
+    flipComactions(true);
+    String snapshotThree = tableName.getNameAsString() + "-snapshot-three";
+    createAndAssertSnapshot(tableName, snapshotThree);

Review Comment:
   never mind, I see the FileNotFoundException in the surefire-reports output
   ```
   2025-06-09T09:21:12,815 ERROR 
[RS_SNAPSHOT_OPERATIONS-regionserver/localhost:0-2 
{event_type=RS_SNAPSHOT_REGIONS, pid=30}] handler.RSProcedureHandler(58): pid=30
   java.io.FileNotFoundException: File does not exist: 
hdfs://localhost:53558/user/david.manning/test-data/f20e0b01-3003-4b21-96ff-e3bc7ced0a0c/data/default/testMultiSnapshotRestoreWithMerge/7474bec476ee394571fbd58c96e4ad18/A/173119c97b5f45b2800e34048ba08fca
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1832)
 ~[hadoop-hdfs-client-3.4.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1825)
 ~[hadoop-hdfs-client-3.4.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.4.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1840)
 ~[hadoop-hdfs-client-3.4.1.jar:?]
        at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) 
~[hadoop-common-3.4.1.jar:?]
        at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:387)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:130)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:66)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:267)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:235)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.addRegionToSnapshot(HRegion.java:5293)
 ~[classes/:?]
   ```
   and this continues for more than 10 minutes
   ```
   2025-06-09T09:33:58,615 DEBUG 
[RpcServer.priority.RWQ.Fifo.read.handler=1,queue=1,port=53567 {}] 
master.HMaster(4202): Remote procedure failed, pid=30
   org.apache.hadoop.hbase.procedure2.RemoteProcedureException: 
java.io.FileNotFoundException: File does not exist: 
hdfs://localhost:53558/user/david.manning/test-data/f20e0b01-3003-4b21-96ff-e3bc7ced0a0c/data/default/testMultiSnapshotRestoreWithMerge/7474bec476ee394571fbd58c96e4ad18/A/173119c97b5f45b2800e34048ba08fca
        at 
org.apache.hadoop.hbase.procedure2.RemoteProcedureException.fromProto(RemoteProcedureException.java:123)
 ~[hbase-procedure-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.lambda$reportProcedureDone$5(MasterRpcServices.java:2578)
 ~[classes/:?]
        at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
        at 
java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1092) 
~[?:?]
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportProcedureDone(MasterRpcServices.java:2573)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16726)
 ~[hbase-protocol-shaded-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
   ```
   
   I don't know if there is a way to make it fail faster, or do fewer retries, 
or bubble up the FileNotFoundException to the test failure instead of it being 
a timeout/interrupt, but that would be nice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to