[ 
https://issues.apache.org/jira/browse/IMPALA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956643#comment-17956643
 ] 

Michael Smith commented on IMPALA-13303:
----------------------------------------

Do we need to worry about 
https://github.com/apache/impala/blob/4.5.0/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java#L386-L387?

> File listing could still be recursive even if 
> impala.disable.recursive.listing is true
> --------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13303
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13303
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>             Fix For: Impala 4.5.0
>
>
> During the development of IMPALA-13117, I found the table property 
> "impala.disable.recursive.listing" is not respected during the initial 
> metadata loading, i.e. not reloading from REFRESH or HMS events.
> To reproduce the issue, rewrite this test statement from REFRESH to 
> INVALIDATE METADATA:
> https://github.com/apache/impala/blob/0a45cb5ae6d1345a7d531c22d174c99ea7cedea0/tests/metadata/test_recursive_listing.py#L126
> The test should still pass but it actually fails.
> A simpler way to reproduce the issue is:
> {code:sql}
> create table my_tbl (i int) stored as textfile 
> tblproperties('impala.disable.recursive.listing'='true');
> describe formatted my_tbl; // Get the table location, e,g, 
> hdfs://localhost:20500/test-warehouse/my_tbl
> {code}
> Upload 3 files to that table location: dir1/data.txt, dir2/data.txt, data.txt.
> {code}
> echo 1 > data.txt
> hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir1
> hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir2
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir1
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir2
> {code}
> Then refresh the table and show the files:
> {code:sql}
> refresh my_tbl;
> show files in my_tbl;
> +-------------------------------------------------------------+------+-----------+-----------+
> | Path                                                        | Size | 
> Partition | EC Policy |
> +-------------------------------------------------------------+------+-----------+-----------+
> | hdfs://localhost:20500/test-warehouse/my_tbl/data.txt      | 2B   |         
>   | NONE      |
> | hdfs://localhost:20500/test-warehouse/my_tbl/dir1/data.txt | 2B   |         
>   | NONE      |
> | hdfs://localhost:20500/test-warehouse/my_tbl/dir2/data.txt | 2B   |         
>   | NONE      |
> +-------------------------------------------------------------+------+-----------+-----------+{code}
> Only the first file under the table folder directly should be shown in the 
> results. The other two files are in sub dirs so should be ignored since 
> recursively listing is disabled.
> This feature is added in IMPALA-8454. Though rarely used in production, it'd 
> be nice to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to