[
https://issues.apache.org/jira/browse/IMPALA-12867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noémi Pap-Takács updated IMPALA-12867:
--------------------------------------
Description:
{{'OPTIMIZE TABLE <table_name>'}} rewrites all files of the table regardless of
size and type, even if the table does not contain any small or delete files.
With '{{{}FILE_SIZE_THRESHOLD_MB'{}}} option, the user should be able to
specify a file size limit to rewrite only small files.
{code:java}
Syntax: OPTIMIZE TABLE <table_name> (FILE_SIZE_THRESHOLD_MB=100);{code}
The value of the threshold is the file size in MBs. Data files larger than the
given limit will only be rewritten if they are referenced from delete deltas.
Note that if '{{{}FILE_SIZE_THRESHOLD_MB'{}}} is set, only the selected files
will be rewritten according to the latest schema and partition spec. Therefore
the intact data files might still have an older schema or partition layout. Use
{{'OPTIMIZE TABLE table_name'}} to rewrite the entire table according to the
latest schema and partititon layout.
was:
{{'OPTIMIZE TABLE <table_name>'}} rewrites all files of the table regardless of
size and type, even if the table does not contain any small or delete files.
With '{{{}FILE_SIZE_THRESHOLD'{}}} option, the user should be able to specify a
file size limit to rewrite only small files.
{code:java}
Syntax: OPTIMIZE TABLE <table_name> (FILE_SIZE_THRESHOLD_MB=100);{code}
The value of the threshold is the file size in MBs. Data files larger than the
given limit will only be rewritten if they are referenced from delete deltas.
Note that if '{{{}FILE_SIZE_THRESHOLD'{}}} is set, only the selected files will
be rewritten according to the latest schema and partition spec. Therefore the
intact data files might still have an older schema or partition layout. Use
{{'OPTIMIZE TABLE table_name'}} to rewrite the entire table according to the
latest schema and partititon layout.
> Filter files to OPTIMIZE based on file size
> -------------------------------------------
>
> Key: IMPALA-12867
> URL: https://issues.apache.org/jira/browse/IMPALA-12867
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Noémi Pap-Takács
> Assignee: Noémi Pap-Takács
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> {{'OPTIMIZE TABLE <table_name>'}} rewrites all files of the table regardless
> of size and type, even if the table does not contain any small or delete
> files.
> With '{{{}FILE_SIZE_THRESHOLD_MB'{}}} option, the user should be able to
> specify a file size limit to rewrite only small files.
> {code:java}
> Syntax: OPTIMIZE TABLE <table_name> (FILE_SIZE_THRESHOLD_MB=100);{code}
> The value of the threshold is the file size in MBs. Data files larger than
> the given limit will only be rewritten if they are referenced from delete
> deltas.
> Note that if '{{{}FILE_SIZE_THRESHOLD_MB'{}}} is set, only the selected files
> will be rewritten according to the latest schema and partition spec.
> Therefore the intact data files might still have an older schema or partition
> layout. Use {{'OPTIMIZE TABLE table_name'}} to rewrite the entire table
> according to the latest schema and partititon layout.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]