[
https://issues.apache.org/jira/browse/HUDI-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen closed HUDI-5516.
----------------------------
Resolution: Fixed
Fixed via master branch: e62b9da66b6dbf7838fc82aca2a5644e4aad7e59
> Reduce memory footprint on workload with thousand active partitions
> -------------------------------------------------------------------
>
> Key: HUDI-5516
> URL: https://issues.apache.org/jira/browse/HUDI-5516
> Project: Apache Hudi
> Issue Type: Improvement
> Components: flink
> Reporter: Alexander Trushev
> Assignee: Danny Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.13.0
>
>
> We can reduce memory footprint on workload with thousand active partitions
> between checkpoints. That workload is relevant with wide checkpoint interval.
> More specifically, active partition here is a special case of active fileId.
> Write client holds map with write handles to create ReplaceHandle between
> checkpoints. It leads to OutOfMemoryError on the workload because write
> handle is huge object.
> {code:sql}
> create table source (
> `id` int,
> `data` string
> ) with (
> 'connector' = 'datagen',
> 'rows-per-second' = '100',
> 'fields.id.kind' = 'sequence',
> 'fields.id.start' = '0',
> 'fields.id.end' = '3000'
> );
> create table sink (
> `id` int primary key,
> `data` string,
> `part` string
> ) partitioned by (`part`) with (
> 'connector' = 'hudi',
> 'path' = '/tmp/sink',
> 'write.batch.size' = '0.001', -- 1024 bytes
> 'write.task.max.size' = '101.001', -- 101.001MB
> 'write.merge.max_memory' = '1' -- 1024 bytes
> );
> insert into sink select `id`, `data`, concat('part', cast(`id` as string)) as
> `part` from source;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)