[ https://issues.apache.org/jira/browse/HBASE-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899475#comment-17899475 ]
yuting sun edited comment on HBASE-28915 at 11/19/24 12:35 PM: --------------------------------------------------------------- The problem has been located. The minor compaction failed and generated dirty data files, which were left in the data/ns/table/family path. When the region is reopened, the region will reload the dirty data files, resulting in ghost data. !image-2024-11-19-20-34-55-052.png|width=914,height=116! My idea is that after the minor compaction, if the compaction status is Aborted compaction, I will archive the generated dirty data files to the archive directory. If the archive fails due to force majeure, The secondary backup operation will determine whether the files loaded by the store do not contain the files in the data directory when the compaction is executed next time. If not, reload all the files in the data directory to perform compaction to avoid the generation of ghost data. was (Author: JIRAUSER283580): My idea is that after the minor compaction, if the compaction status is Aborted compaction, I will archive the generated dirty data files to the archive directory. If the archive fails due to force majeure, The secondary backup operation will determine whether the files loaded by the store do not contain the files in the data directory when the compaction is executed next time. If not, reload all the files in the data directory to perform compaction to avoid the generation of ghost data. > Ghost data Issue: deleted data reappears after some time > -------------------------------------------------------- > > Key: HBASE-28915 > URL: https://issues.apache.org/jira/browse/HBASE-28915 > Project: HBase > Issue Type: Bug > Components: Compaction, Deletes, regionserver > Affects Versions: 0.98.0, 2.3.0 > Reporter: yuting sun > Priority: Major > Attachments: image-2024-11-19-20-34-55-052.png > > > Encountered a ghost data issue in the online production environment: A > business batch request simultaneously includes put and delete operations. > Deleted data has a very small probability of reappearing after some time, and > the problematic data is within a single region. The primary cluster has a > backup cluster, and after the problematic rowkey appears in the primary > cluster, it cannot be found in the backup cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)