[ 
https://issues.apache.org/jira/browse/HBASE-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899475#comment-17899475
 ] 

yuting sun edited comment on HBASE-28915 at 11/19/24 12:37 PM:
---------------------------------------------------------------

The problem has been located. The minor compaction failed and generated dirty 
data files, which were left in the data/ns/table/family path. When the region 
is reopened, the region will reload the dirty data files, resulting in ghost 
data.

There should not be a file with a date before the file created by major 
compaction
!image-2024-11-19-20-34-55-052.png|width=914,height=116!

My idea is that after the minor compaction, if the compaction status is Aborted 
compaction, I will archive the generated dirty data files to the archive 
directory. If the archive fails due to force majeure,
The secondary backup operation will determine whether the files loaded by the 
store do not contain the files in the data directory when the compaction is 
executed next time. If not, reload all the files in the data directory to 
perform compaction to avoid the generation of ghost data.

Welcome to discuss


was (Author: JIRAUSER283580):
The problem has been located. The minor compaction failed and generated dirty 
data files, which were left in the data/ns/table/family path. When the region 
is reopened, the region will reload the dirty data files, resulting in ghost 
data.
!image-2024-11-19-20-34-55-052.png|width=914,height=116!
My idea is that after the minor compaction, if the compaction status is Aborted 
compaction, I will archive the generated dirty data files to the archive 
directory. If the archive fails due to force majeure,
The secondary backup operation will determine whether the files loaded by the 
store do not contain the files in the data directory when the compaction is 
executed next time. If not, reload all the files in the data directory to 
perform compaction to avoid the generation of ghost data.

> Ghost data Issue: deleted data reappears after some time
> --------------------------------------------------------
>
>                 Key: HBASE-28915
>                 URL: https://issues.apache.org/jira/browse/HBASE-28915
>             Project: HBase
>          Issue Type: Bug
>          Components: Compaction, Deletes, regionserver
>    Affects Versions: 0.98.0, 2.3.0
>            Reporter: yuting sun
>            Priority: Major
>         Attachments: image-2024-11-19-20-34-55-052.png
>
>
> Encountered a ghost data issue in the online production environment: A 
> business batch request simultaneously includes put and delete operations. 
> Deleted data has a very small probability of reappearing after some time, and 
> the problematic data is within a single region. The primary cluster has a 
> backup cluster, and after the problematic rowkey appears in the primary 
> cluster, it cannot be found in the backup cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to