[
https://issues.apache.org/jira/browse/IMPALA-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18017145#comment-18017145
]
Riza Suminto commented on IMPALA-14367:
---------------------------------------
Attached screenshot are the tests that has increased in number (in exhaustive
exploration) after
[IMPALA-13125|http://issues.apache.org/jira/browse/IMPALA-13125]
> Reduce/rationalize test vector set for compressed file formats
> --------------------------------------------------------------
>
> Key: IMPALA-14367
> URL: https://issues.apache.org/jira/browse/IMPALA-14367
> Project: IMPALA
> Issue Type: Test
> Components: Infrastructure, Test
> Reporter: Csaba Ringhofer
> Priority: Major
> Attachments: Screenshot 2025-08-29 at 5.24.53 PM.png
>
>
> During exhaustive tests a lot of test vectors are created for some rarely
> used file formats (e.g. rc, sequence), because these files can be also
> compressed and each file format/compression pair is considered a new item in
> the file_format dimension. Block vs record level compression can be an extra
> dimension (e.g. seq/gzip/record). Meanwhile the most commonly used file
> format Parquet can also use several compression types at page level, but only
> snappy compression is heavily tested.
> As an example, https://gerrit.cloudera.org/#/c/23342/ fixed pairwise test
> vector generation, bumping exhaustive EE/custom cluster tests from 11000 to
> 17000, and restricting the some tests to use only a single compression per
> file format (single_compression_constraint() ) reduced it to 16000.
> A few questions arise:
> 1. what is the priority of testing different file formats? this depends IMO
> both on the frequency of usage and the development activity in that area
> 2. what tests should have a file_format dimension at all?
> 3. what tests should consider compression in the file format dimension?
> 4. is it possible to also remove some vectors from test data generation, or
> all are needed to get a good coverage? it is possible that some tables are
> created but never touched by tests
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]