[ 
https://issues.apache.org/jira/browse/HUDI-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-3637:
----------------------------
    Description: 
HoodieMetadataTableValidator validation of the latest base files and file 
slices fails due to the following (from MT, log files are missing, compared to 
FS view).  The validation failure may be due to the inflight compaction.  Need 
to investigate whether this affects the file listing for write operations.  The 
behavior is that after some instants, the validation can pass, so the MT 
correct is guaranteed, but the file listing view may have a bug.
{code:java}
file slices from metadata: [FileSlice 
{fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28', 
fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'}, 
baseCommitTime=20220314001058266, 
baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
 fileLen=106839698, BootstrapBaseFile=null}', logFiles='[]'}]
file slices from file system and base files: [FileSlice 
{fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28', 
fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'}, 
baseCommitTime=20220314001058266, 
baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
 fileLen=106839698, BootstrapBaseFile=null}', 
logFiles='[HoodieLogFile{pathStr='file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/.769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_20220314001058266.log.1_2-111-954',
 fileLen=51607682}]'}]
22/03/14 00:33:03 ERROR HoodieMetadataTableValidator: Metadata table validation 
failed for 2022/1/28 due to HoodieValidationException {code}
Compaction:
{code:java}
Partition Path │ FileId                                 │ Base-Instant      │ 
Data File Path                                                            │ 
Total Delta Files │ getMetrics                                                  
                                                                ║
╠══
 2022/1/28      │ 769bf7ac-d6d0-452c-bf54-bbe7e8381766-0 │ 20220314001058266 │ 
769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet │ 1   
              │ {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=151.0, 
TOTAL_LOG_FILES_SIZE=5.1607682E7, TOTAL_IO_WRITE_MB=101.0, TOTAL_IO_MB=252.0} ║ 
{code}

  was:
HoodieMetadataTableValidator validation of the latest base files and file 
slices fails due to the following.  The validation failure may be due to the 
inflight compaction.  Need to investigate whether this affects the file listing 
for write operations.  The behavior is that after some instants, the validation 
can pass, so the MT correct is guaranteed, but the file listing view may have a 
bug.
{code:java}
file slices from metadata: [FileSlice 
{fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28', 
fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'}, 
baseCommitTime=20220314001058266, 
baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
 fileLen=106839698, BootstrapBaseFile=null}', logFiles='[]'}]
file slices from file system and base files: [FileSlice 
{fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28', 
fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'}, 
baseCommitTime=20220314001058266, 
baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
 fileLen=106839698, BootstrapBaseFile=null}', 
logFiles='[HoodieLogFile{pathStr='file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/.769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_20220314001058266.log.1_2-111-954',
 fileLen=51607682}]'}]
22/03/14 00:33:03 ERROR HoodieMetadataTableValidator: Metadata table validation 
failed for 2022/1/28 due to HoodieValidationException {code}
Compaction:
{code:java}
Partition Path │ FileId                                 │ Base-Instant      │ 
Data File Path                                                            │ 
Total Delta Files │ getMetrics                                                  
                                                                ║
╠══
 2022/1/28      │ 769bf7ac-d6d0-452c-bf54-bbe7e8381766-0 │ 20220314001058266 │ 
769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet │ 1   
              │ {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=151.0, 
TOTAL_LOG_FILES_SIZE=5.1607682E7, TOTAL_IO_WRITE_MB=101.0, TOTAL_IO_MB=252.0} ║ 
{code}


> Check file listing from FS vs metadata table when compaction in pending and 
> inflight
> ------------------------------------------------------------------------------------
>
>                 Key: HUDI-3637
>                 URL: https://issues.apache.org/jira/browse/HUDI-3637
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Ethan Guo
>            Priority: Major
>
> HoodieMetadataTableValidator validation of the latest base files and file 
> slices fails due to the following (from MT, log files are missing, compared 
> to FS view).  The validation failure may be due to the inflight compaction.  
> Need to investigate whether this affects the file listing for write 
> operations.  The behavior is that after some instants, the validation can 
> pass, so the MT correct is guaranteed, but the file listing view may have a 
> bug.
> {code:java}
> file slices from metadata: [FileSlice 
> {fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28', 
> fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'}, 
> baseCommitTime=20220314001058266, 
> baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
>  fileLen=106839698, BootstrapBaseFile=null}', logFiles='[]'}]
> file slices from file system and base files: [FileSlice 
> {fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28', 
> fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'}, 
> baseCommitTime=20220314001058266, 
> baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
>  fileLen=106839698, BootstrapBaseFile=null}', 
> logFiles='[HoodieLogFile{pathStr='file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/.769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_20220314001058266.log.1_2-111-954',
>  fileLen=51607682}]'}]
> 22/03/14 00:33:03 ERROR HoodieMetadataTableValidator: Metadata table 
> validation failed for 2022/1/28 due to HoodieValidationException {code}
> Compaction:
> {code:java}
> Partition Path │ FileId                                 │ Base-Instant      │ 
> Data File Path                                                            │ 
> Total Delta Files │ getMetrics                                                
>                                                                   ║
> ╠══
>  2022/1/28      │ 769bf7ac-d6d0-452c-bf54-bbe7e8381766-0 │ 20220314001058266 
> │ 769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet │ 
> 1                 │ {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=151.0, 
> TOTAL_LOG_FILES_SIZE=5.1607682E7, TOTAL_IO_WRITE_MB=101.0, TOTAL_IO_MB=252.0} 
> ║ {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to