Csaba Ringhofer created IMPALA-14365:
----------------------------------------
Summary: Test framework cleanup 2025
Key: IMPALA-14365
URL: https://issues.apache.org/jira/browse/IMPALA-14365
Project: IMPALA
Issue Type: Improvement
Components: Infrastructure, Test
Reporter: Csaba Ringhofer
It would be nice to rethink the test set and infrastructure in Impala.
A few reasons why it is actual now:
- Python 2->3 migration of EE tests is nearly ready (IMPALA-8508)
- Beeswax protocol deprecation (IMPALA-12095) is nearly finished and tests can
assume using HS2 (or hs2-http)
- test vector set generation was wrong, the fix leads to much more test in
exhaustive (IMPALA-8508)
- workload handling is confusing and was often misunderstood: IMPALA-3947
- our tests are simply slow (>5h to merge a commit)
Some testing decisions were made more than a decode ago, and priorities have
also shifted since that time. Some examples:
- Impala is mainly used with Parquet files, while some rarely used file formats
have a high footprint in the exhaustive test vector set (e.g. sequence files,
rc files)
- HBase is rarely used through Impala while it has a large impact on dataload
and tests
- Hive ACID tests have a huge footprint while there is little active
development around it
- there is lot of development around Iceberg, and compared to that the Iceberg
testdata and test set seems small
- compatibility is tested with Hive while in practice reading Spark generated
data seems more common
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]