[ https://issues.apache.org/jira/browse/HIVE-18149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279445#comment-16279445 ]
Hive QA commented on HIVE-18149: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12900726/HIVE-18149.01.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11509 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[runtime_skewjoin_mapjoin_spark] (batchId=54) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_character_length] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_octet_length] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_view] (batchId=15) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_opt_vectorization] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=160) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_view] (batchId=110) org.apache.hadoop.hive.ql.TestAcidOnTez.testMapJoinOnTez (batchId=224) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=227) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8117/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8117/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8117/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12900726 - PreCommit-HIVE-Build > Stats: rownum estimation from datasize underestimates in most cases > ------------------------------------------------------------------- > > Key: HIVE-18149 > URL: https://issues.apache.org/jira/browse/HIVE-18149 > Project: Hive > Issue Type: Sub-task > Components: Statistics > Reporter: Zoltan Haindrich > Assignee: Zoltan Haindrich > Attachments: HIVE-18149.01.patch, HIVE-18149.01wip01.patch > > > rownum estimation is based on the following fact as of now: > * datasize being used from the following sources: > ** basicstats aggregates the loaded "on-heap" row sizes ; other readers are > able to give "raw size" estimation - I've checked orc; but I'm sure others > will do the same....api docs are a bit vague about the methods purpose... > ** if the basicstats level info is not available; the filesystem level > "file-size-sums" are used as the "raw data size" ; which is multiplied by the > [deserialization > ratio|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L261] > ; which is currently 1. > the problem with all of this is that deser factor is 1; and that rowsize > counts in the online object headers.. > example; 20 rows are loaded into a partition > [columnstats_partlvl_dp.q|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/test/queries/clientpositive/columnstats_partlvl_dp.q#L7] > after HIVE-18108 [this > explain|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/test/queries/clientpositive/columnstats_partlvl_dp.q#L25] > will estimate the rowsize of the table to be 404 bytes; however the 20 rows > of text is only 169 bytes...so it ends up with 0 rows... -- This message was sent by Atlassian JIRA (v6.4.14#64029)