[
https://issues.apache.org/jira/browse/IMPALA-13996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947328#comment-17947328
]
Quanlong Huang commented on IMPALA-13996:
-----------------------------------------
logs/file-list-begin-1.log and logs/file-list-end-1.log show that there are
only two data files of tpch_parquet.lineitem in erasure coding builds:
{noformat}
drwxr-xr-x - jenkins supergroup 0 2025-04-23 21:18
/test-warehouse/tpch.lineitem_parquet
-rw-r--r-- 1 jenkins supergroup 108505625 2025-04-23 21:18
/test-warehouse/tpch.lineitem_parquet/964c79869e367026-91e7763600000000_2136191419_data.0.parq
-rw-r--r-- 1 jenkins supergroup 94429994 2025-04-23 21:18
/test-warehouse/tpch.lineitem_parquet/964c79869e367026-91e7763600000001_1400772053_data.0.parq
drwxr-xr-x - jenkins supergroup 0 2025-04-23 21:18
/test-warehouse/tpch.lineitem_parquet/_impala_insert_staging{noformat}
They are generated by an INSERT query:
{code:sql}
INSERT OVERWRITE TABLE tpch_parquet.lineitem SELECT * FROM tpch.lineitem{code}
Extracted the profile as 964c79869e367026_91e7763600000000_profile.txt. The
query just runs on two impalads, thus generates two data files only:
{noformat}
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2{noformat}
It seems the block size is larger so the input data file of tpch.lineitem is
split into only two blocks (splits), thus using two impalads:
{noformat}
Fragment F00:
Instance 964c79869e367026:91e7763600000000
(host=impala-ec2-centos79-m6i-4xlarge-xldisk-1f06.vpc.cloudera.com:27000):
Last report received time: 2025-04-23 21:18:29.636
Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/384.00
MB
...
Instance 964c79869e367026:91e7763600000001
(host=impala-ec2-centos79-m6i-4xlarge-xldisk-1f06.vpc.cloudera.com:27001):
Last report received time: 2025-04-23 21:18:27.781
Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/334.94
MB{noformat}
> TestAllowIncompleteData.test_too_many_files fails erasure coding builds
> -----------------------------------------------------------------------
>
> Key: IMPALA-13996
> URL: https://issues.apache.org/jira/browse/IMPALA-13996
> Project: IMPALA
> Issue Type: Bug
> Reporter: Surya Hebbar
> Assignee: Quanlong Huang
> Priority: Major
>
> TestAllowIncompleteData.test_too_many_files fails erasure coding builds -
> Error -
> {code}
> assert "Too many files to collect in table tpch_parquet.lineitem: 3. Current
> limit is 1 configured by startup flag 'catalog_partial_fetch_max_files'.
> Consider compacting files of the table." in "Query
> f74919e60b835567:da9967a400000000 failed:\nLocalCatalogException: Could not
> load partitions for table tpch_parq...t limit is 1 configured by startup flag
> 'catalog_partial_fetch_max_files'. Consider compacting files of the
> table.\n\n" + where "Query f74919e60b835567:da9967a400000000
> failed:\nLocalCatalogException: Could not load partitions for table
> tpch_parq...t limit is 1 configured by startup flag
> 'catalog_partial_fetch_max_files'. Consider compacting files of the
> table.\n\n" = str(ImpalaBeeswaxException()){code}
>
> Stacktrace -
> {code}
> custom_cluster/test_local_catalog.py:721: in test_too_many_files
> assert err in str(exception)
> E assert "Too many files to collect in table tpch_parquet.lineitem: 3.
> Current limit is 1 configured by startup flag
> 'catalog_partial_fetch_max_files'. Consider compacting files of the table."
> in "Query f74919e60b835567:da9967a400000000 failed:\nLocalCatalogException:
> Could not load partitions for table tpch_parq...t limit is 1 configured by
> startup flag 'catalog_partial_fetch_max_files'. Consider compacting files of
> the table.\n\n"
> E + where "Query f74919e60b835567:da9967a400000000
> failed:\nLocalCatalogException: Could not load partitions for table
> tpch_parq...t limit is 1 configured by startup flag
> 'catalog_partial_fetch_max_files'. Consider compacting files of the
> table.\n\n" = str(ImpalaBeeswaxException())
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]