[
https://issues.apache.org/jira/browse/HUDI-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678965#comment-17678965
]
Teng Huo commented on HUDI-4717:
--------------------------------
Hi,
We got the exactly same issue recently in our Flink MOR pipeline.
!issue.png!
I have checked Hudi files and all compaction operation were done because
parquet files are good. I can't understand how it loses events between
compact_task and compact_commit.
May I ask if there is anyway to do trouble shooting for this issue? Really
thanks.
> CompactionCommitEvent message corrupted when sent by compact_task
> ------------------------------------------------------------------
>
> Key: HUDI-4717
> URL: https://issues.apache.org/jira/browse/HUDI-4717
> Project: Apache Hudi
> Issue Type: Bug
> Components: flink, flink-sql
> Affects Versions: 0.10.1
> Reporter: nonggia.liang
> Priority: Major
> Labels: pull-request-available
> Attachments: figure 1.png, figure 2.png, issue.png
>
>
> When running a flink application inserting data to hudi table with async
> compaction enabled, we found that after running for some time, compactions
> become abnormal, which were scheduled, executed succesfully, but not
> committed. And we can observed inconsistence between the messges compact_task
> sending and compact_commit receiving in number, as figure 1 shown below.
> By looking into the abnormal InputChannel state of the compact_commit
> operator using tool Arthas, we found the channel is waiting for a `huge`
> message of size 16M, which is far more than the size of normal
> CompactionCommitEvent object. As shown by figure 2.
> Now in the method processElement() of class CompactFunction, we use collector
> to send CompactionCommitEvent message asynchronously, but the Collector
> provided by flink seems not to be thread-safe. Can that be the cause of the
> corruption of the message received by compact_commit operator? Shall we use
> the MailboxExecutorAdapter to run collector.collect just like in
> StreamReadOperator?
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)