[ https://issues.apache.org/jira/browse/HIVE-18149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285012#comment-16285012 ]
Hive QA commented on HIVE-18149: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12900887/HIVE-18149.02.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8164/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8164/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8164/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult [localFile=/data/hiveptest/logs/PreCommit-HIVE-Build-8164/succeeded/205_UTBatch_service_8_tests, remoteFile=/home/hiveptest/104.198.217.87-hiveptest-0/logs/, getExitCode()=255, getException()=null, getUser()=hiveptest, getHost()=104.198.217.87, getInstance()=0]: 'Warning: Permanently added '104.198.217.87' (ECDSA) to the list of known hosts. receiving incremental file list ./ TEST-205_UTBatch_service_8_tests-TEST-org.apache.hive.service.auth.TestLdapAtnProviderWithMiniDS.xml 0 0% 0.00kB/s 0:00:00 91,994 100% 2.58MB/s 0:00:00 (xfr#1, to-chk=12/14) TEST-205_UTBatch_service_8_tests-TEST-org.apache.hive.service.auth.ldap.TestCustomQueryFilter.xml 0 0% 0.00kB/s 0:00:00 87,958 100% 2.33MB/s 0:00:00 (xfr#2, to-chk=11/14) TEST-205_UTBatch_service_8_tests-TEST-org.apache.hive.service.auth.ldap.TestQuery.xml 0 0% 0.00kB/s 0:00:00 87,939 100% 1.22MB/s 0:00:00 (xfr#3, to-chk=10/14) TEST-205_UTBatch_service_8_tests-TEST-org.apache.hive.service.auth.ldap.TestUserFilter.xml 0 0% 0.00kB/s 0:00:00 87,929 100% 1.22MB/s 0:00:00 (xfr#4, to-chk=9/14) TEST-205_UTBatch_service_8_tests-TEST-org.apache.hive.service.auth.ldap.TestUserSearchFilter.xml 0 0% 0.00kB/s 0:00:00 88,367 100% 837.82kB/s 0:00:00 (xfr#5, to-chk=8/14) TEST-205_UTBatch_service_8_tests-TEST-org.apache.hive.service.cli.TestCLIServiceConnectionLimits.xml 0 0% 0.00kB/s 0:00:00 89,238 100% 837.95kB/s 0:00:00 (xfr#6, to-chk=7/14) TEST-205_UTBatch_service_8_tests-TEST-org.apache.hive.service.cli.TestCLIServiceRestore.xml 0 0% 0.00kB/s 0:00:00 87,707 100% 823.57kB/s 0:00:00 (xfr#7, to-chk=6/14) TEST-205_UTBatch_service_8_tests-TEST-org.apache.hive.service.cli.TestHiveSQLException.xml 0 0% 0.00kB/s 0:00:00 88,448 100% 822.62kB/s 0:00:00 (xfr#8, to-chk=5/14) maven-test.txt 0 0% 0.00kB/s 0:00:00 6,086 100% 56.07kB/s 0:00:00 (xfr#9, to-chk=4/14) logs/ logs/derby.log 0 0% 0.00kB/s 0:00:00 989 100% 9.11kB/s 0:00:00 (xfr#10, to-chk=1/14) logs/hive.log 0 0% 0.00kB/s 0:00:00 35,487,744 0% 33.84MB/s 0:02:01 91,553,792 2% 43.66MB/s 0:01:32 148,144,128 3% 47.09MB/s 0:01:24 205,651,968 4% 49.04MB/s 0:01:20 262,864,896 6% 54.22MB/s 0:01:11 319,324,160 7% 54.35MB/s 0:01:10 377,126,912 8% 54.63MB/s 0:01:08 Timeout, server 104.198.217.87 not responding. rsync: connection unexpectedly closed (391788893 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(226) [receiver=3.1.1] rsync: connection unexpectedly closed (904 bytes received so far) [generator] rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.1] ssh: connect to host 104.198.217.87 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 104.198.217.87 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 104.198.217.87 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 104.198.217.87 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ' {noformat} This message is automatically generated. ATTACHMENT ID: 12900887 - PreCommit-HIVE-Build > Stats: rownum estimation from datasize underestimates in most cases > ------------------------------------------------------------------- > > Key: HIVE-18149 > URL: https://issues.apache.org/jira/browse/HIVE-18149 > Project: Hive > Issue Type: Sub-task > Components: Statistics > Reporter: Zoltan Haindrich > Assignee: Zoltan Haindrich > Attachments: HIVE-18149.01.patch, HIVE-18149.01wip01.patch, > HIVE-18149.02.patch > > > rownum estimation is based on the following fact as of now: > * datasize being used from the following sources: > ** basicstats aggregates the loaded "on-heap" row sizes ; other readers are > able to give "raw size" estimation - I've checked orc; but I'm sure others > will do the same....api docs are a bit vague about the methods purpose... > ** if the basicstats level info is not available; the filesystem level > "file-size-sums" are used as the "raw data size" ; which is multiplied by the > [deserialization > ratio|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L261] > ; which is currently 1. > the problem with all of this is that deser factor is 1; and that rowsize > counts in the online object headers.. > example; 20 rows are loaded into a partition > [columnstats_partlvl_dp.q|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/test/queries/clientpositive/columnstats_partlvl_dp.q#L7] > after HIVE-18108 [this > explain|https://github.com/apache/hive/blob/d9924ab3e285536f7e2cc15ecbea36a78c59c66d/ql/src/test/queries/clientpositive/columnstats_partlvl_dp.q#L25] > will estimate the rowsize of the table to be 404 bytes; however the 20 rows > of text is only 169 bytes...so it ends up with 0 rows... -- This message was sent by Atlassian JIRA (v6.4.14#64029)