[
https://issues.apache.org/jira/browse/IMPALA-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Rozsa updated IMPALA-12867:
---------------------------------
Fix Version/s: Impala 4.5.0
> Filter files to OPTIMIZE based on file size
> -------------------------------------------
>
> Key: IMPALA-12867
> URL: https://issues.apache.org/jira/browse/IMPALA-12867
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Noemi Pap-Takacs
> Assignee: Noemi Pap-Takacs
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> {{'OPTIMIZE TABLE <table_name>'}} rewrites all files of the table regardless
> of size and type, even if the table does not contain any small or delete
> files.
> With '{{{}FILE_SIZE_THRESHOLD'{}}} option, the user should be able to specify
> a file size limit to rewrite only small files.
> {code:java}
> Syntax: OPTIMIZE TABLE <table_name> (FILE_SIZE_THRESHOLD_MB=100);{code}
> The value of the threshold is the file size in MBs. Data files larger than
> the given limit will only be rewritten if they are referenced from delete
> deltas.
> Note that if '{{{}FILE_SIZE_THRESHOLD'{}}} is set, only the selected files
> will be rewritten according to the latest schema and partition spec.
> Therefore the intact data files might still have an older schema or partition
> layout. Use {{'OPTIMIZE TABLE table_name'}} to rewrite the entire table
> according to the latest schema and partititon layout.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]