[
https://issues.apache.org/jira/browse/IMPALA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956643#comment-17956643
]
Michael Smith commented on IMPALA-13303:
----------------------------------------
Do we need to worry about
https://github.com/apache/impala/blob/4.5.0/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java#L386-L387?
> File listing could still be recursive even if
> impala.disable.recursive.listing is true
> --------------------------------------------------------------------------------------
>
> Key: IMPALA-13303
> URL: https://issues.apache.org/jira/browse/IMPALA-13303
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
> Fix For: Impala 4.5.0
>
>
> During the development of IMPALA-13117, I found the table property
> "impala.disable.recursive.listing" is not respected during the initial
> metadata loading, i.e. not reloading from REFRESH or HMS events.
> To reproduce the issue, rewrite this test statement from REFRESH to
> INVALIDATE METADATA:
> https://github.com/apache/impala/blob/0a45cb5ae6d1345a7d531c22d174c99ea7cedea0/tests/metadata/test_recursive_listing.py#L126
> The test should still pass but it actually fails.
> A simpler way to reproduce the issue is:
> {code:sql}
> create table my_tbl (i int) stored as textfile
> tblproperties('impala.disable.recursive.listing'='true');
> describe formatted my_tbl; // Get the table location, e,g,
> hdfs://localhost:20500/test-warehouse/my_tbl
> {code}
> Upload 3 files to that table location: dir1/data.txt, dir2/data.txt, data.txt.
> {code}
> echo 1 > data.txt
> hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir1
> hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir2
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir1
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir2
> {code}
> Then refresh the table and show the files:
> {code:sql}
> refresh my_tbl;
> show files in my_tbl;
> +-------------------------------------------------------------+------+-----------+-----------+
> | Path | Size |
> Partition | EC Policy |
> +-------------------------------------------------------------+------+-----------+-----------+
> | hdfs://localhost:20500/test-warehouse/my_tbl/data.txt | 2B |
> | NONE |
> | hdfs://localhost:20500/test-warehouse/my_tbl/dir1/data.txt | 2B |
> | NONE |
> | hdfs://localhost:20500/test-warehouse/my_tbl/dir2/data.txt | 2B |
> | NONE |
> +-------------------------------------------------------------+------+-----------+-----------+{code}
> Only the first file under the table folder directly should be shown in the
> results. The other two files are in sub dirs so should be ignored since
> recursively listing is disabled.
> This feature is added in IMPALA-8454. Though rarely used in production, it'd
> be nice to fix it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]