Hi Team, Recently I have been working on trying to reproduce the following CI failure without success:
org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerWithCustomCatalog > testScanTable[fileFormat=PARQUET, engine=tez] FAILED java.lang.IllegalArgumentException: Failed to execute Hive query 'SELECT * FROM default.customers ORDER BY customer_id DESC': Error while processing statement: FAILED : Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask Caused by: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask Since I was unsuccessful reproing the case, and the provided error message in CI logs are not really helpful this means I can not fix this flaky test for now. :( After Marton Bods changes for adding logs for tests (https://github.com/apache/iceberg/pull/1712 <https://github.com/apache/iceberg/pull/1712>), we could have more info about the failures in the test logs (build/test-results/test/binary/output.bin), but I am not sure if that is retained and accessible after a CI run. I would like to propose adding the following to the build.gradle for the CI runs: test { testLogging { if ("true".equalsIgnoreCase(System.getenv('CI'))) { events "failed", "passed" + testLogging.showStandardStreams = true } else { events "failed" } exceptionFormat "full" } } This would add the logs printed during the tests to the standard output for the CI runs. Example can be seen here (https://github.com/pvary/iceberg/runs/1405960983 <https://github.com/pvary/iceberg/runs/1405960983>) - only enabled standard streams for the hive related tests in this patch to see the results. Pros: Easily accessible log information for the failed runs Cons: Harder to read CI logs Possible cost associated with retaining the logs I think having more logs would be great, but I am not sure who pays the bill and whether having bigger logs could cause any problem and whether the CI is able to handle the increased amount of data. Any thoughts, comments, ideas? Thanks, Peter