About Usage of Hive Index
Hi, When I try to use Hive Indexing, I have the following questions. 1. Does Indexing have the same performance on both the partitioned and non-partitioned tables? How about bucketed and un-bucked tables? 2. Is it possible for us to build index of function of indexed columns, like create index x on my_table(my_function_1(col1, col2, ...), my_function_2(col1, col2, ...), ...)? 3. If we try to use the non-indexed columns in base table, can we use the index table to speed up the query? join? Any suggestions? Thanks in advance. Best wishes, Lin
Re: Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #157
Unsubscribe. Thanks.
AST to Query Sring
Hi folks, Currently I am working on a project which needs to generate query string based on the modified AST. Does Hive contain this mechanism already? If not, which tools would help to complete the task? Thanks in advance. Lin
Re: AST to Query Sring
Hi Navis, Thanks for your suggestions. This is a good starting point. Thanks, Lin On Thu, Jul 24, 2014 at 6:00 PM, Navis류승우 wrote: > You need TokenRewriteStream for the ASTNode. which is in Context or > ParseDriver. > > String rewrite(TokenRewriteStream rewriter, ASTNode source) throws > Exception { > // some modification.. > return rewriter.toString(source.getTokenStartIndex(), > source.getTokenStopIndex()); > } > > Thanks, > Navis > > > 2014-07-25 7:17 GMT+09:00 Lin Liu : > > > Hi folks, > > > > Currently I am working on a project which needs to generate query string > > based on the modified AST. > > Does Hive contain this mechanism already? > > If not, which tools would help to complete the task? > > > > Thanks in advance. > > > > Lin > > >
Help for Hive 0.13 Unit Test Execution
Hi Folks, When I followed strictly the HiveDeveloperFAQ wiki to run unit test for Hive 0.13, I found that maven merely compiles the tests, but doesn't execute them when we use command: mvn test -Phadoop-1 However, when we specify the test case parameter, like mvn test -Dtest=TestCliDriver -Phadoop-1, maven would execute the test. Would you please let me know whether there are something else I need to do to successfully run the unit test? Thanks, Lin
Review Request 55605: HIVE-15166 - Provide beeline option to set the jline history max size
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55605/ --- Review request for hive and Aihua Xu. Bugs: HIVE-15166 https://issues.apache.org/jira/browse/HIVE-15166 Repository: hive-git Description --- Currently Beeline does not provide an option to limit the max size for beeline history file, in the case that each query is very big, it will flood the history file and slow down beeline on start up and shutdown. Diffs - beeline/src/java/org/apache/hive/beeline/BeeLine.java 65818dd beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 9f330e3 beeline/src/main/resources/BeeLine.properties 141f0c6 beeline/src/test/org/apache/hive/beeline/TestBeelineArgParsing.java d73d374 Diff: https://reviews.apache.org/r/55605/diff/ Testing --- Manual testing + a simple test case. Thanks, Eric Lin
Re: Review Request 55605: HIVE-15166 - Provide beeline option to set the jline history max size
> On Jan. 17, 2017, 2:13 p.m., Aihua Xu wrote: > > beeline/src/java/org/apache/hive/beeline/BeeLine.java, line 1185 > > <https://reviews.apache.org/r/55605/diff/1/?file=1606448#file1606448line1185> > > > > FileHistory implementation will load 500 (default) lines during this > > constructor (it's the limitation of FileHistory) and then will resize to > > the specified size. > > > > Potentially there is a problem to see OOM, but I guess that's the best > > we can do right now to limit the output. The user in general wouldn't edit > > the history file. Yes you are right, Aihua. At least for now we have an option to control the history row size. - Eric --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55605/#review161864 --- On Jan. 17, 2017, 7:22 a.m., Eric Lin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55605/ > --- > > (Updated Jan. 17, 2017, 7:22 a.m.) > > > Review request for hive and Aihua Xu. > > > Bugs: HIVE-15166 > https://issues.apache.org/jira/browse/HIVE-15166 > > > Repository: hive-git > > > Description > --- > > Currently Beeline does not provide an option to limit the max size for > beeline history file, in the case that each query is very big, it will flood > the history file and slow down beeline on start up and shutdown. > > > Diffs > - > > beeline/src/java/org/apache/hive/beeline/BeeLine.java 65818dd > beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 9f330e3 > beeline/src/main/resources/BeeLine.properties 141f0c6 > beeline/src/test/org/apache/hive/beeline/TestBeelineArgParsing.java d73d374 > > Diff: https://reviews.apache.org/r/55605/diff/ > > > Testing > --- > > Manual testing + a simple test case. > > > Thanks, > > Eric Lin > >
Re: Review Request 55605: HIVE-15166 - Provide beeline option to set the jline history max size
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55605/ --- (Updated Jan. 19, 2017, 1:45 a.m.) Review request for hive and Aihua Xu. Changes --- Have to move setMaxSize into shutdown hook as the options were not ready at the previous code location. Tested manually and confirmed working. Bugs: HIVE-15166 https://issues.apache.org/jira/browse/HIVE-15166 Repository: hive-git Description --- Currently Beeline does not provide an option to limit the max size for beeline history file, in the case that each query is very big, it will flood the history file and slow down beeline on start up and shutdown. Diffs (updated) - beeline/src/java/org/apache/hive/beeline/BeeLine.java 65818dd beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 9f330e3 beeline/src/main/resources/BeeLine.properties 141f0c6 beeline/src/test/org/apache/hive/beeline/TestBeelineArgParsing.java d73d374 Diff: https://reviews.apache.org/r/55605/diff/ Testing --- Manual testing + a simple test case. Thanks, Eric Lin
Review Request 57009: HIVE-16029 - COLLECT_SET and COLLECT_LIST does not return NULL in the result
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57009/ --- Review request for hive and Aihua Xu. Bugs: HIVE-16029 https://issues.apache.org/jira/browse/HIVE-16029 Repository: hive-git Description --- See the test case below: {code} 0: jdbc:hive2://localhost:1/default> select * from collect_set_test; +-+ | collect_set_test.a | +-+ | 1 | | 2 | | NULL| | 4 | | NULL| +-+ 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,4] | +---+ {code} The correct result should be: {code} 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,null,4] | +---+ {code} Diffs - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java 2b5e6dd Diff: https://reviews.apache.org/r/57009/diff/ Testing --- Manully tested and confirmed result is correct: {code} 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,null,4] | +---+ {code} Thanks, Eric Lin
Re: Review Request 55605: HIVE-15166 - Provide beeline option to set the jline history max size
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55605/ --- (Updated March 8, 2017, 10:27 a.m.) Review request for hive and Aihua Xu. Changes --- Updated patch based on latest hive master branch Bugs: HIVE-15166 https://issues.apache.org/jira/browse/HIVE-15166 Repository: hive-git Description --- Currently Beeline does not provide an option to limit the max size for beeline history file, in the case that each query is very big, it will flood the history file and slow down beeline on start up and shutdown. Diffs (updated) - beeline/src/java/org/apache/hive/beeline/BeeLine.java 3c8fccc beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 9f330e3 beeline/src/main/resources/BeeLine.properties af86284 beeline/src/test/org/apache/hive/beeline/TestBeelineArgParsing.java d73d374 Diff: https://reviews.apache.org/r/55605/diff/3/ Changes: https://reviews.apache.org/r/55605/diff/2-3/ Testing --- Manual testing + a simple test case. Thanks, Eric Lin
Re: Review Request 57009: HIVE-16029 - COLLECT_SET and COLLECT_LIST does not return NULL in the result
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57009/ --- (Updated April 15, 2017, 11:52 a.m.) Review request for hive and Aihua Xu. Changes --- New patch to allow COLLECT_SET to take two arguments so that original behaviour is maintained. Bugs: HIVE-16029 https://issues.apache.org/jira/browse/HIVE-16029 Repository: hive-git Description --- See the test case below: {code} 0: jdbc:hive2://localhost:1/default> select * from collect_set_test; +-+ | collect_set_test.a | +-+ | 1 | | 2 | | NULL| | 4 | | NULL| +-+ 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,4] | +---+ {code} The correct result should be: {code} 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,null,4] | +---+ {code} Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 156d19b ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 0c2cf90 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java 2b5e6dd Diff: https://reviews.apache.org/r/57009/diff/2/ Changes: https://reviews.apache.org/r/57009/diff/1-2/ Testing --- Manully tested and confirmed result is correct: {code} 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,null,4] | +---+ {code} Thanks, Eric Lin
Re: Review Request 57009: HIVE-16029 - COLLECT_SET and COLLECT_LIST does not return NULL in the result
> On Feb. 24, 2017, 4:08 p.m., Aihua Xu wrote: > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java > > Line 118 (original) > > <https://reviews.apache.org/r/57009/diff/1/?file=1646634#file1646634line118> > > > > I just checked the java. Seems java set doesn't include null. Let's ask > > Chao for the opinion since he worked on that fix. Attached new patch to maintain original behaviour - Eric --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57009/#review166717 ------- On Feb. 24, 2017, 1:01 a.m., Eric Lin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57009/ > --- > > (Updated Feb. 24, 2017, 1:01 a.m.) > > > Review request for hive and Aihua Xu. > > > Bugs: HIVE-16029 > https://issues.apache.org/jira/browse/HIVE-16029 > > > Repository: hive-git > > > Description > --- > > See the test case below: > > {code} > 0: jdbc:hive2://localhost:1/default> select * from collect_set_test; > +-+ > | collect_set_test.a | > +-+ > | 1 | > | 2 | > | NULL| > | 4 | > | NULL| > +-+ > > 0: jdbc:hive2://localhost:1/default> select collect_set(a) from > collect_set_test; > +---+ > | _c0 | > +---+ > | [1,2,4] | > +---+ > > {code} > > The correct result should be: > > {code} > 0: jdbc:hive2://localhost:1/default> select collect_set(a) from > collect_set_test; > +---+ > | _c0 | > +---+ > | [1,2,null,4] | > +---+ > {code} > > > Diffs > - > > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java > 2b5e6dd > > > Diff: https://reviews.apache.org/r/57009/diff/1/ > > > Testing > --- > > Manully tested and confirmed result is correct: > > {code} > 0: jdbc:hive2://localhost:1/default> select collect_set(a) from > collect_set_test; > +---+ > | _c0 | > +---+ > | [1,2,null,4] | > +---+ > {code} > > > Thanks, > > Eric Lin > >
Re: Review Request 57009: HIVE-16029 - COLLECT_SET and COLLECT_LIST does not return NULL in the result
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57009/ --- (Updated May 23, 2017, 7:14 a.m.) Review request for hive and Aihua Xu. Changes --- updated test cases that failed in the last patch. Bugs: HIVE-16029 https://issues.apache.org/jira/browse/HIVE-16029 Repository: hive-git Description --- See the test case below: {code} 0: jdbc:hive2://localhost:1/default> select * from collect_set_test; +-+ | collect_set_test.a | +-+ | 1 | | 2 | | NULL| | 4 | | NULL| +-+ 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,4] | +---+ {code} The correct result should be: {code} 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,null,4] | +---+ {code} Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 156d19b ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 0c2cf90 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java 2b5e6dd ql/src/test/results/clientpositive/llap/udaf_collect_set_2.q.out aa55979 ql/src/test/results/clientpositive/spark/udaf_collect_set.q.out ee152ca ql/src/test/results/clientpositive/udaf_collect_set.q.out ee152ca ql/src/test/results/clientpositive/udaf_collect_set_2.q.out f2e76a7 Diff: https://reviews.apache.org/r/57009/diff/3/ Changes: https://reviews.apache.org/r/57009/diff/2-3/ Testing --- Manully tested and confirmed result is correct: {code} 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,null,4] | +---+ {code} Thanks, Eric Lin
Re: Review Request 57009: HIVE-16029 - COLLECT_SET and COLLECT_LIST does not return NULL in the result
> On Feb. 24, 2017, 4:08 p.m., Aihua Xu wrote: > > Hi Aihua, Can you please help to review my change and provide suggestions? Thanks - Eric --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57009/#review166717 --- On May 23, 2017, 7:14 a.m., Eric Lin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57009/ > --- > > (Updated May 23, 2017, 7:14 a.m.) > > > Review request for hive and Aihua Xu. > > > Bugs: HIVE-16029 > https://issues.apache.org/jira/browse/HIVE-16029 > > > Repository: hive-git > > > Description > --- > > See the test case below: > > {code} > 0: jdbc:hive2://localhost:1/default> select * from collect_set_test; > +-+ > | collect_set_test.a | > +-+ > | 1 | > | 2 | > | NULL| > | 4 | > | NULL| > +-+ > > 0: jdbc:hive2://localhost:1/default> select collect_set(a) from > collect_set_test; > +---+ > | _c0 | > +---+ > | [1,2,4] | > +---+ > > {code} > > The correct result should be: > > {code} > 0: jdbc:hive2://localhost:1/default> select collect_set(a) from > collect_set_test; > +---+ > | _c0 | > +---+ > | [1,2,null,4] | > +---+ > {code} > > > Diffs > - > > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java > 156d19b > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java > 0c2cf90 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java > 2b5e6dd > ql/src/test/results/clientpositive/llap/udaf_collect_set_2.q.out aa55979 > ql/src/test/results/clientpositive/spark/udaf_collect_set.q.out ee152ca > ql/src/test/results/clientpositive/udaf_collect_set.q.out ee152ca > ql/src/test/results/clientpositive/udaf_collect_set_2.q.out f2e76a7 > > > Diff: https://reviews.apache.org/r/57009/diff/3/ > > > Testing > --- > > Manully tested and confirmed result is correct: > > {code} > 0: jdbc:hive2://localhost:1/default> select collect_set(a) from > collect_set_test; > +---+ > | _c0 | > +---+ > | [1,2,null,4] | > +---+ > {code} > > > Thanks, > > Eric Lin > >
Review Request 59978: HIVE-16794 - Default value for hive.spark.client.connect.timeout of 1000ms is too low
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59978/ --- Review request for hive and Aihua Xu. Bugs: HIVE-16794 https://issues.apache.org/jira/browse/HIVE-16794 Repository: hive-git Description --- Currently the default timeout value for hive.spark.client.connect.timeout is set at 1000ms, which is only 1 second. This is not enough when cluster is busy and user will constantly getting the following timeout errors: 17/05/03 03:20:08 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915 java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915 at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156) at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542) Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915 at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:220) at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) 17/05/03 03:20:08 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915) 17/05/03 03:20:16 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 10 ms. Please check earlier log output for errors. Failing the application. 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915) 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1492040605432_11445 17/05/03 03:20:16 INFO util.ShutdownHookManager: Shutdown hook called Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fce8db3 Diff: https://reviews.apache.org/r/59978/diff/1/ Testing --- This is a simple config change to increase timeout, no test was performed. Thanks, Eric Lin
Review Request 60355: User-defined UDF functions can be registered as invariant functions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/ --- Review request for hive and cheng xu. Bugs: HIVE-16929 https://issues.apache.org/jira/browse/HIVE-16929 Repository: hive-git Description --- Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. Such as, hive.aux.udf.package.name.list com.sample.udf,com.test.udf Instructions: 1, upload your jar file to $ HIVE_HOME/auxlib 2, configure your UDF function corresponding to the package to the following configuration parameters hive.aux.udf.package.name.list com.sample.udf 3, the configuration items need to be placed in the hive-site.xml file 4, restart the Hive service to take effect Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION Diff: https://reviews.apache.org/r/60355/diff/1/ Testing --- Thanks, ZhangBing Lin
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/ --- (Updated 六月 22, 2017, 3:18 a.m.) Review request for hive, cheng xu and Xuefu Zhang. Summary (updated) - HIVE-16929 User-defined UDF functions can be registered as invariant functions Bugs: HIVE-16929 https://issues.apache.org/jira/browse/HIVE-16929 Repository: hive-git Description --- Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. Such as, hive.aux.udf.package.name.list com.sample.udf,com.test.udf Instructions: 1, upload your jar file to $ HIVE_HOME/auxlib 2, configure your UDF function corresponding to the package to the following configuration parameters hive.aux.udf.package.name.list com.sample.udf 3, the configuration items need to be placed in the hive-site.xml file 4, restart the Hive service to take effect Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION Diff: https://reviews.apache.org/r/60355/diff/1/ Testing --- Thanks, ZhangBing Lin
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
> On 六月 22, 2017, 8:17 a.m., Barna Zsombor Klara wrote: > > Thank you for the patch ZhangBing Lin. I only had a few minor comments and > > nits. > > Since you are adding several utility methods, do you think it would be > > possible to add a few unit tests? > > Thanks! Thank you for your review,I will modify it with your suggest - ZhangBing --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/#review178639 --- On 六月 22, 2017, 3:18 a.m., ZhangBing Lin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60355/ > --- > > (Updated 六月 22, 2017, 3:18 a.m.) > > > Review request for hive, cheng xu and Xuefu Zhang. > > > Bugs: HIVE-16929 > https://issues.apache.org/jira/browse/HIVE-16929 > > > Repository: hive-git > > > Description > --- > > Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, > which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package > that contains the corresponding configuration package name under the class > registered as a constant function. > Such as, > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > Instructions: >1, upload your jar file to $ HIVE_HOME/auxlib >2, configure your UDF function corresponding to the package to the > following configuration parameters > > hive.aux.udf.package.name.list > com.sample.udf > > >3, the configuration items need to be placed in the hive-site.xml file >4, restart the Hive service to take effect > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 > ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION > > > Diff: https://reviews.apache.org/r/60355/diff/1/ > > > Testing > --- > > > Thanks, > > ZhangBing Lin > >
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/ --- (Updated 六月 23, 2017, 3:13 a.m.) Review request for hive, cheng xu and Xuefu Zhang. Bugs: HIVE-16929 https://issues.apache.org/jira/browse/HIVE-16929 Repository: hive-git Description --- Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. Such as, hive.aux.udf.package.name.list com.sample.udf,com.test.udf Instructions: 1, upload your jar file to $ HIVE_HOME/auxlib 2, configure your UDF function corresponding to the package to the following configuration parameters hive.aux.udf.package.name.list com.sample.udf 3, the configuration items need to be placed in the hive-site.xml file 4, restart the Hive service to take effect Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION Diff: https://reviews.apache.org/r/60355/diff/2/ Changes: https://reviews.apache.org/r/60355/diff/1-2/ Testing --- Thanks, ZhangBing Lin
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
> On 六月 22, 2017, 8:17 a.m., Barna Zsombor Klara wrote: > > Thank you for the patch ZhangBing Lin. I only had a few minor comments and > > nits. > > Since you are adding several utility methods, do you think it would be > > possible to add a few unit tests? > > Thanks! > > ZhangBing Lin wrote: > Thank you for your review,I will modify it with your suggest Thank you for your review,the unit tests need a jar file,How can I write these unit tests with no jar file? > On 六月 22, 2017, 8:17 a.m., Barna Zsombor Klara wrote: > > ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java > > Lines 40 (patched) > > <https://reviews.apache.org/r/60355/diff/1/?file=1758014#file1758014line40> > > > > nit: Should this be a warning instead of info? I had use warn instead info - ZhangBing --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/#review178639 ------- On 六月 23, 2017, 3:13 a.m., ZhangBing Lin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60355/ > --- > > (Updated 六月 23, 2017, 3:13 a.m.) > > > Review request for hive, cheng xu and Xuefu Zhang. > > > Bugs: HIVE-16929 > https://issues.apache.org/jira/browse/HIVE-16929 > > > Repository: hive-git > > > Description > --- > > Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, > which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package > that contains the corresponding configuration package name under the class > registered as a constant function. > Such as, > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > Instructions: >1, upload your jar file to $ HIVE_HOME/auxlib >2, configure your UDF function corresponding to the package to the > following configuration parameters > > hive.aux.udf.package.name.list > com.sample.udf > > >3, the configuration items need to be placed in the hive-site.xml file >4, restart the Hive service to take effect > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 > ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION > > > Diff: https://reviews.apache.org/r/60355/diff/2/ > > > Testing > --- > > > Thanks, > > ZhangBing Lin > >
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
> On 六月 22, 2017, 8:17 a.m., Barna Zsombor Klara wrote: > > ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java > > Lines 110 (patched) > > <https://reviews.apache.org/r/60355/diff/1/?file=1758013#file1758013line110> > > > > Can we log this out instead of just writing to the err stream? Same on > > line 117,122 and 149. I had log this out instead of just writing to the err stream - ZhangBing --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/#review178639 ------- On 六月 23, 2017, 3:13 a.m., ZhangBing Lin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60355/ > --- > > (Updated 六月 23, 2017, 3:13 a.m.) > > > Review request for hive, cheng xu and Xuefu Zhang. > > > Bugs: HIVE-16929 > https://issues.apache.org/jira/browse/HIVE-16929 > > > Repository: hive-git > > > Description > --- > > Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, > which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package > that contains the corresponding configuration package name under the class > registered as a constant function. > Such as, > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > Instructions: >1, upload your jar file to $ HIVE_HOME/auxlib >2, configure your UDF function corresponding to the package to the > following configuration parameters > > hive.aux.udf.package.name.list > com.sample.udf > > >3, the configuration items need to be placed in the hive-site.xml file >4, restart the Hive service to take effect > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 > ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION > > > Diff: https://reviews.apache.org/r/60355/diff/2/ > > > Testing > --- > > > Thanks, > > ZhangBing Lin > >
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
> On 六月 22, 2017, 8:17 a.m., Barna Zsombor Klara wrote: > > ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java > > Lines 110 (patched) > > <https://reviews.apache.org/r/60355/diff/1/?file=1758013#file1758013line110> > > > > Can we log this out instead of just writing to the err stream? Same on > > line 117,122 and 149. > > ZhangBing Lin wrote: > I had log this out instead of just writing to the err stream I had change it,But Issues list still exist - ZhangBing --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/#review178639 ------- On 六月 23, 2017, 3:13 a.m., ZhangBing Lin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60355/ > --- > > (Updated 六月 23, 2017, 3:13 a.m.) > > > Review request for hive, cheng xu and Xuefu Zhang. > > > Bugs: HIVE-16929 > https://issues.apache.org/jira/browse/HIVE-16929 > > > Repository: hive-git > > > Description > --- > > Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, > which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package > that contains the corresponding configuration package name under the class > registered as a constant function. > Such as, > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > Instructions: >1, upload your jar file to $ HIVE_HOME/auxlib >2, configure your UDF function corresponding to the package to the > following configuration parameters > > hive.aux.udf.package.name.list > com.sample.udf > > >3, the configuration items need to be placed in the hive-site.xml file >4, restart the Hive service to take effect > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 > ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION > > > Diff: https://reviews.apache.org/r/60355/diff/2/ > > > Testing > --- > > > Thanks, > > ZhangBing Lin > >
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/ --- (Updated 六月 23, 2017, 3:51 a.m.) Review request for hive, cheng xu and Xuefu Zhang. Bugs: HIVE-16929 https://issues.apache.org/jira/browse/HIVE-16929 Repository: hive-git Description --- Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. Such as, hive.aux.udf.package.name.list com.sample.udf,com.test.udf Instructions: 1, upload your jar file to $ HIVE_HOME/auxlib 2, configure your UDF function corresponding to the package to the following configuration parameters hive.aux.udf.package.name.list com.sample.udf 3, the configuration items need to be placed in the hive-site.xml file 4, restart the Hive service to take effect Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION Diff: https://reviews.apache.org/r/60355/diff/3/ Changes: https://reviews.apache.org/r/60355/diff/2-3/ Testing --- Thanks, ZhangBing Lin
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/ --- (Updated 六月 23, 2017, 3:56 a.m.) Review request for hive, cheng xu and Xuefu Zhang. Bugs: HIVE-16929 https://issues.apache.org/jira/browse/HIVE-16929 Repository: hive-git Description --- Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. Such as, hive.aux.udf.package.name.list com.sample.udf,com.test.udf Instructions: 1, upload your jar file to $ HIVE_HOME/auxlib 2, configure your UDF function corresponding to the package to the following configuration parameters hive.aux.udf.package.name.list com.sample.udf 3, the configuration items need to be placed in the hive-site.xml file 4, restart the Hive service to take effect Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION Diff: https://reviews.apache.org/r/60355/diff/4/ Changes: https://reviews.apache.org/r/60355/diff/3-4/ Testing --- Thanks, ZhangBing Lin
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
> On 六月 28, 2017, 10:49 a.m., Barna Zsombor Klara wrote: > > Sorry for getting back this late and thank you for the updates. I don't > > want to be too nitpicky but I did have another comment about rewording a > > log line, sorry. > > As for testing, you do have jars on the classpath during testing. So for > > example you can be pretty sure that the junit jar will be on your classpath > > somewhere, and you could write tests against ClassUtil using the junit > > packages. > > > > But the patch LGTM. Thank you for your suggest,I will try to write a unit test case based on your advice. > On 六月 28, 2017, 10:49 a.m., Barna Zsombor Klara wrote: > > ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java > > Lines 85 (patched) > > <https://reviews.apache.org/r/60355/diff/4/?file=1760292#file1760292line85> > > > > Nit: > > I think what you meant should be one of the following: > > - Exception occured while executing getJarFile > > - Exception occured during the execution of getJarFile > > - getJarFile encountered an exception > > > > Same on line 90. Thank you for your review,I will modify it - ZhangBing --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/#review179085 --- On 六月 23, 2017, 3:56 a.m., ZhangBing Lin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60355/ > --- > > (Updated 六月 23, 2017, 3:56 a.m.) > > > Review request for hive, cheng xu and Xuefu Zhang. > > > Bugs: HIVE-16929 > https://issues.apache.org/jira/browse/HIVE-16929 > > > Repository: hive-git > > > Description > --- > > Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, > which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package > that contains the corresponding configuration package name under the class > registered as a constant function. > Such as, > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > Instructions: >1, upload your jar file to $ HIVE_HOME/auxlib >2, configure your UDF function corresponding to the package to the > following configuration parameters > > hive.aux.udf.package.name.list > com.sample.udf > > >3, the configuration items need to be placed in the hive-site.xml file >4, restart the Hive service to take effect > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 > ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION > > > Diff: https://reviews.apache.org/r/60355/diff/4/ > > > Testing > --- > > > Thanks, > > ZhangBing Lin > >
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/ --- (Updated 六月 29, 2017, 6:37 a.m.) Review request for hive, cheng xu and Xuefu Zhang. Bugs: HIVE-16929 https://issues.apache.org/jira/browse/HIVE-16929 Repository: hive-git Description --- Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. Such as, hive.aux.udf.package.name.list com.sample.udf,com.test.udf Instructions: 1, upload your jar file to $ HIVE_HOME/auxlib 2, configure your UDF function corresponding to the package to the following configuration parameters hive.aux.udf.package.name.list com.sample.udf 3, the configuration items need to be placed in the hive-site.xml file 4, restart the Hive service to take effect Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/util/TestClassUtil.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/util/TestUDFRegister.java PRE-CREATION Diff: https://reviews.apache.org/r/60355/diff/5/ Changes: https://reviews.apache.org/r/60355/diff/4-5/ Testing --- Thanks, ZhangBing Lin
Re: Review Request 60355: HIVE-16929 User-defined UDF functions can be registered as invariant functions
> On 六月 28, 2017, 10:49 a.m., Barna Zsombor Klara wrote: > > Sorry for getting back this late and thank you for the updates. I don't > > want to be too nitpicky but I did have another comment about rewording a > > log line, sorry. > > As for testing, you do have jars on the classpath during testing. So for > > example you can be pretty sure that the junit jar will be on your classpath > > somewhere, and you could write tests against ClassUtil using the junit > > packages. > > > > But the patch LGTM. > > ZhangBing Lin wrote: > Thank you for your suggest,I will try to write a unit test case based on > your advice. Resubmit a new patch,Patch add the following: 1.add two unit tests for ClassUtil and UDFResister 2.Adjust the custom function registration function to the end of the registration, to prevent the same name function to register when the system function can not be covered by abnormal 3.Fix the contents of the log - ZhangBing --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60355/#review179085 ------- On 六月 29, 2017, 6:37 a.m., ZhangBing Lin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60355/ > --- > > (Updated 六月 29, 2017, 6:37 a.m.) > > > Review request for hive, cheng xu and Xuefu Zhang. > > > Bugs: HIVE-16929 > https://issues.apache.org/jira/browse/HIVE-16929 > > > Repository: hive-git > > > Description > --- > > Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, > which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package > that contains the corresponding configuration package name under the class > registered as a constant function. > Such as, > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > Instructions: >1, upload your jar file to $ HIVE_HOME/auxlib >2, configure your UDF function corresponding to the package to the > following configuration parameters > > hive.aux.udf.package.name.list > com.sample.udf > > >3, the configuration items need to be placed in the hive-site.xml file >4, restart the Hive service to take effect > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a8bdefdad6 > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 9795f3ef98 > ql/src/java/org/apache/hadoop/hive/ql/util/ClassUtil.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/util/UDFRegister.java PRE-CREATION > ql/src/test/org/apache/hadoop/hive/ql/util/TestClassUtil.java PRE-CREATION > ql/src/test/org/apache/hadoop/hive/ql/util/TestUDFRegister.java > PRE-CREATION > > > Diff: https://reviews.apache.org/r/60355/diff/5/ > > > Testing > --- > > > Thanks, > > ZhangBing Lin > >
[jira] [Created] (HIVE-8312) Implicit type conversion on Join keys
Lin Liu created HIVE-8312: - Summary: Implicit type conversion on Join keys Key: HIVE-8312 URL: https://issues.apache.org/jira/browse/HIVE-8312 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Lin Liu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8312) Implicit type conversion on Join keys
[ https://issues.apache.org/jira/browse/HIVE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HIVE-8312: -- Description: Suppose we have a query as follows. " SELECT FROM A LEFT SEMI JOIN B ON (A.col1 = B.col2) WHERE ... " If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, Hive finds the common compatible type (here is DOUBLE) for both cols and do implicit type conversion. However, this implicit conversion from STRING to DOUBLE could produce NULL values, which could further generate unexpected results, like skew. I just wonder: Is this case by design? If so, what is the logic? If not, how can we solve it? > Implicit type conversion on Join keys > - > > Key: HIVE-8312 > URL: https://issues.apache.org/jira/browse/HIVE-8312 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Lin Liu > > Suppose we have a query as follows. > " > SELECT > FROM A LEFT SEMI JOIN B > ON (A.col1 = B.col2) > WHERE ... > " > If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, > Hive finds the common compatible type (here is DOUBLE) for both cols and do > implicit type conversion. > However, this implicit conversion from STRING to DOUBLE could produce NULL > values, which could further > generate unexpected results, like skew. > I just wonder: Is this case by design? If so, what is the logic? If not, how > can we solve it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8312) Implicit type conversion on Join keys
[ https://issues.apache.org/jira/browse/HIVE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HIVE-8312: -- Description: Suppose we have a query as follows. " SELECT FROM A LEFT SEMI JOIN B ON (A.col1 = B.col2) WHERE ... " If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols and do implicit type conversion. However, this implicit conversion from STRING to DOUBLE could produce NULL values, which could further generate unexpected results, like skew. Why do we always convert to NUMERIC type? Any rationale here? If not expected, how should we handle it? was: Suppose we have a query as follows. " SELECT FROM A LEFT SEMI JOIN B ON (A.col1 = B.col2) WHERE ... " If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, Hive finds the common compatible type (here is DOUBLE) for both cols and do implicit type conversion. However, this implicit conversion from STRING to DOUBLE could produce NULL values, which could further generate unexpected results, like skew. I just wonder: Is this case by design? If so, what is the logic? If not, how can we solve it? > Implicit type conversion on Join keys > - > > Key: HIVE-8312 > URL: https://issues.apache.org/jira/browse/HIVE-8312 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Lin Liu > > Suppose we have a query as follows. > " > SELECT > FROM A LEFT SEMI JOIN B > ON (A.col1 = B.col2) > WHERE ... > " > If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, > Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols > and do implicit type conversion. > However, this implicit conversion from STRING to DOUBLE could produce NULL > values, which could further > generate unexpected results, like skew. > Why do we always convert to NUMERIC type? Any rationale here? If not > expected, how should we handle it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8312) Implicit type conversion on Join keys
[ https://issues.apache.org/jira/browse/HIVE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HIVE-8312: -- Description: Suppose we have a query as follows. " SELECT FROM A LEFT SEMI JOIN B ON (A.col1 = B.col2) WHERE ... " If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols and do implicit type conversion. However, this implicit conversion from STRING to DOUBLE could produce NULL values, which could further generate unexpected results, like skew. Why do we always convert to NUMERIC type when both columns are in different type groups? Do we expect the corresponding exceptions happen? was: Suppose we have a query as follows. " SELECT FROM A LEFT SEMI JOIN B ON (A.col1 = B.col2) WHERE ... " If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols and do implicit type conversion. However, this implicit conversion from STRING to DOUBLE could produce NULL values, which could further generate unexpected results, like skew. Why do we always convert to NUMERIC type? Any rationale here? If not expected, how should we handle it? > Implicit type conversion on Join keys > - > > Key: HIVE-8312 > URL: https://issues.apache.org/jira/browse/HIVE-8312 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Lin Liu > > Suppose we have a query as follows. > " > SELECT > FROM A LEFT SEMI JOIN B > ON (A.col1 = B.col2) > WHERE ... > " > If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, > Hive finds the common compatible NUMERIC type (here is DOUBLE) for both cols > and do implicit type conversion. > However, this implicit conversion from STRING to DOUBLE could produce NULL > values, which could further > generate unexpected results, like skew. > Why do we always convert to NUMERIC type when both columns are in different > type groups? Do we expect the corresponding exceptions happen? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-22091) NPE on HS2 start up due to bad data in FUNCS table
Eric Lin created HIVE-22091: --- Summary: NPE on HS2 start up due to bad data in FUNCS table Key: HIVE-22091 URL: https://issues.apache.org/jira/browse/HIVE-22091 Project: Hive Issue Type: Bug Affects Versions: 3.1.0 Reporter: Eric Lin If FUNCS table contains a stale DB_ID that has no links in DBS table, HS2 will fail to start up with NPE error: {code:bash} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.NullPointerException) at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:220) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:338) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:299) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:274) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:256) at org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider.init(DefaultHiveAuthorizationProvider.java:29) at org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProviderBase.setConf(HiveAuthorizationProviderBase.java:112) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.NullPointerException) at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3646) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:231) at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:215) ... 27 more {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-23676) Test to cover wildcard partVals in listPartitionNames
anton lin created HIVE-23676: Summary: Test to cover wildcard partVals in listPartitionNames Key: HIVE-23676 URL: https://issues.apache.org/jira/browse/HIVE-23676 Project: Hive Issue Type: Bug Components: Metastore, Test Reporter: anton lin Incorrect documentation for MetaStoreClient method {code:java} List listPartitionNames(String db_name, String tbl_name, List part_vals, short max_parts) {code} Saying _"...If you wish to accept any value for a particular key you can pass ".*" for that value in this list..."_ Any value wildcard behaviour is achieved with empty string _""_. Documentation and tests should reflect this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23974) (hadoop3.3.0 + hive2.3.7) start hiveserver2 get exception: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch
Jimmy Lin created HIVE-23974: Summary: (hadoop3.3.0 + hive2.3.7) start hiveserver2 get exception: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch Key: HIVE-23974 URL: https://issues.apache.org/jira/browse/HIVE-23974 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 2.3.7 Reporter: Jimmy Lin Attachments: h1.png, h2.png (hadoop3.3.0 + hive2.3.7) start hiveserver2 get an exception: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch. As show h2.png. And I has got the hadoop's guava-20.0.jar to replace the hive's, to make they have the same version of guava. As show h1.png. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-17630) RESIGNAL:actual results are inconsistent with expectations at hplsql
ZhangBing Lin created HIVE-17630: Summary: RESIGNAL:actual results are inconsistent with expectations at hplsql Key: HIVE-17630 URL: https://issues.apache.org/jira/browse/HIVE-17630 Project: Hive Issue Type: Bug Components: hpl/sql Reporter: ZhangBing Lin Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18026) Hive webhcat principal configuration optimization
ZhangBing Lin created HIVE-18026: Summary: Hive webhcat principal configuration optimization Key: HIVE-18026 URL: https://issues.apache.org/jira/browse/HIVE-18026 Project: Hive Issue Type: Bug Reporter: ZhangBing Lin Assignee: ZhangBing Lin Hive webhcat principal configuration optimization,when you configure: templeton.kerberos.principal HTTP/_HOST@ The '_HOST' should be replaced by specific host name. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18277) Use subtraction and addition should be a good tips when type not support with hplsql
ZhangBing Lin created HIVE-18277: Summary: Use subtraction and addition should be a good tips when type not support with hplsql Key: HIVE-18277 URL: https://issues.apache.org/jira/browse/HIVE-18277 Project: Hive Issue Type: Bug Components: hpl/sql Affects Versions: 3.0.0 Reporter: ZhangBing Lin Assignee: ZhangBing Lin Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-9662) return columns from HCatalog be ordered in the same manner as hive client for Map data type
Lin Shao created HIVE-9662: -- Summary: return columns from HCatalog be ordered in the same manner as hive client for Map data type Key: HIVE-9662 URL: https://issues.apache.org/jira/browse/HIVE-9662 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Lin Shao Fix For: 0.13.1 in hive-0.13.1, the map datatype column from hcatalog has output order different from the hive output order. But Struct time is fine. And this behavior also can be found in Text file format, not just in RC file format. This is caused by the changes in this patch HIVE-7282. This patch of HIVE-7282 should not go into Hive 0.13.1 unless it has a better way to preserve the order of key-value pairs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326302#comment-14326302 ] Peter Lin commented on HIVE-7292: - Would love to use this production, is it going to release in hive 15? > Hive on Spark > - > > Key: HIVE-7292 > URL: https://issues.apache.org/jira/browse/HIVE-7292 > Project: Hive > Issue Type: Improvement > Components: Spark >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 > Attachments: Hive-on-Spark.pdf > > > Spark as an open-source data analytics cluster computing framework has gained > significant momentum recently. Many Hive users already have Spark installed > as their computing backbone. To take advantages of Hive, they still need to > have either MapReduce or Tez on their cluster. This initiative will provide > user a new alternative so that those user can consolidate their backend. > Secondly, providing such an alternative further increases Hive's adoption as > it exposes Spark users to a viable, feature-rich de facto standard SQL tools > on Hadoop. > Finally, allowing Hive to run on Spark also has performance benefits. Hive > queries, especially those involving multiple reducer stages, will run faster, > thus improving user experience as Tez does. > This is an umbrella JIRA which will cover many coming subtask. Design doc > will be attached here shortly, and will be on the wiki as well. Feedback from > the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-16029) COLLECT_SET and COLLECT_LIST does not return NULL in the result
Eric Lin created HIVE-16029: --- Summary: COLLECT_SET and COLLECT_LIST does not return NULL in the result Key: HIVE-16029 URL: https://issues.apache.org/jira/browse/HIVE-16029 Project: Hive Issue Type: Bug Affects Versions: 2.1.1 Reporter: Eric Lin Assignee: Eric Lin Priority: Minor See the test case below: 0: jdbc:hive2://localhost:1/default> select * from collect_set_test; +-+ | collect_set_test.a | +-+ | 1 | | 2 | | NULL| | 4 | | NULL| +-+ 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,4] | +---+ The correct result should be: 0: jdbc:hive2://localhost:1/default> select collect_set(a) from collect_set_test; +---+ | _c0 | +---+ | [1,2,null,4] | +---+ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
ZhangBing Lin created HIVE-16524: Summary: Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon Key: HIVE-16524 URL: https://issues.apache.org/jira/browse/HIVE-16524 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: ZhangBing Lin Assignee: ZhangBing Lin Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled
ZhangBing Lin created HIVE-16558: Summary: In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled Key: HIVE-16558 URL: https://issues.apache.org/jira/browse/HIVE-16558 Project: Hive Issue Type: Bug Affects Versions: 2.1.0 Reporter: ZhangBing Lin Assignee: ZhangBing Lin Fix For: 3.0.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16640) The ASF Heads have some errors in some class
ZhangBing Lin created HIVE-16640: Summary: The ASF Heads have some errors in some class Key: HIVE-16640 URL: https://issues.apache.org/jira/browse/HIVE-16640 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: ZhangBing Lin Assignee: ZhangBing Lin Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16707) Invalid variable and function of the parameters is invalid
ZhangBing Lin created HIVE-16707: Summary: Invalid variable and function of the parameters is invalid Key: HIVE-16707 URL: https://issues.apache.org/jira/browse/HIVE-16707 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: ZhangBing Lin Assignee: ZhangBing Lin Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16794) Default value for hive.spark.client.connect.timeout of 1000ms is too low
Eric Lin created HIVE-16794: --- Summary: Default value for hive.spark.client.connect.timeout of 1000ms is too low Key: HIVE-16794 URL: https://issues.apache.org/jira/browse/HIVE-16794 Project: Hive Issue Type: Task Components: Spark Affects Versions: 2.1.1 Reporter: Eric Lin Currently the default timeout value for hive.spark.client.connect.timeout is set at 1000ms, which is only 1 second. This is not enough when cluster is busy and user will constantly getting the following timeout errors: {code} 17/05/03 03:20:08 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915 java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915 at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156) at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542) Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915 at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:220) at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) 17/05/03 03:20:08 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915) 17/05/03 03:20:16 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 10 ms. Please check earlier log output for errors. Failing the application. 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915) 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1492040605432_11445 17/05/03 03:20:16 INFO util.ShutdownHookManager: Shutdown hook called {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16824) PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers
ZhangBing Lin created HIVE-16824: Summary: PrimaryToReplicaResourceFunctionTest.java lack the ASF Headers Key: HIVE-16824 URL: https://issues.apache.org/jira/browse/HIVE-16824 Project: Hive Issue Type: Bug Reporter: ZhangBing Lin Assignee: ZhangBing Lin Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
ZhangBing Lin created HIVE-16929: Summary: User-defined UDF functions can be registered as invariant functions Key: HIVE-16929 URL: https://issues.apache.org/jira/browse/HIVE-16929 Project: Hive Issue Type: New Feature Reporter: ZhangBing Lin Assignee: ZhangBing Lin -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17015) Script file should be LF, should not be CRLF, this will lead to the script can not be implemented, the service can not start
ZhangBing Lin created HIVE-17015: Summary: Script file should be LF, should not be CRLF, this will lead to the script can not be implemented, the service can not start Key: HIVE-17015 URL: https://issues.apache.org/jira/browse/HIVE-17015 Project: Hive Issue Type: Bug Reporter: ZhangBing Lin Assignee: ZhangBing Lin Script file should be LF, should not be CRLF, this will lead to the script can not be implemented, the service can not start. List: bin/beeline. bin/hive. bin/hiveserver2. bin/hplsql. bin/metatool. bin/schematool. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17065) You can not successfully deploy hive clusters with Hive guidance documents
ZhangBing Lin created HIVE-17065: Summary: You can not successfully deploy hive clusters with Hive guidance documents Key: HIVE-17065 URL: https://issues.apache.org/jira/browse/HIVE-17065 Project: Hive Issue Type: Improvement Components: Documentation Reporter: ZhangBing Lin Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17094) Modifier 'static' is redundant for inner enums
ZhangBing Lin created HIVE-17094: Summary: Modifier 'static' is redundant for inner enums Key: HIVE-17094 URL: https://issues.apache.org/jira/browse/HIVE-17094 Project: Hive Issue Type: Improvement Reporter: ZhangBing Lin Assignee: ZhangBing Lin Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17453) Missing ASF headers 2 classes
ZhangBing Lin created HIVE-17453: Summary: Missing ASF headers 2 classes Key: HIVE-17453 URL: https://issues.apache.org/jira/browse/HIVE-17453 Project: Hive Issue Type: Bug Reporter: ZhangBing Lin Assignee: ZhangBing Lin Priority: Trivial -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17554) Occurr java.lang.ArithmeticException: / by zero at hplsql component
ZhangBing Lin created HIVE-17554: Summary: Occurr java.lang.ArithmeticException: / by zero at hplsql component Key: HIVE-17554 URL: https://issues.apache.org/jira/browse/HIVE-17554 Project: Hive Issue Type: Bug Components: hpl/sql Reporter: ZhangBing Lin Assignee: ZhangBing Lin -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-12368) Provide support for different versions of same JAR files for loading UDFs
Eric Lin created HIVE-12368: --- Summary: Provide support for different versions of same JAR files for loading UDFs Key: HIVE-12368 URL: https://issues.apache.org/jira/browse/HIVE-12368 Project: Hive Issue Type: New Feature Components: HiveServer2 Reporter: Eric Lin Assignee: Vaibhav Gumashta If we want to setup one cluster to support multiple environments, namely DEV, QA, PRE-PROD etc, this is done in the way that data from different environment will be generated into different locations in the HDFS and Hive Databases. This works fine, however, when need to deploy UDF classes for different environments, it becomes tricky, as each class has the same namespace, even though we have created udf-dev.jar, udf-qa.jar etc. Creating each HS2 per environment is another option, however, with LB setup, it becomes harder. The request is to have HS2 support loading UDFs in such environment, the implementation is open to discussion. I know that this setup is no ideal, as the better approach is to have one cluster per environment, however, in the case that you have limited number of nodes in the setup, this might be the only option and I believe many people can benefit from it. Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12506) SHOW CREATE TABLE command creates a table that does not work for RCFile format
Eric Lin created HIVE-12506: --- Summary: SHOW CREATE TABLE command creates a table that does not work for RCFile format Key: HIVE-12506 URL: https://issues.apache.org/jira/browse/HIVE-12506 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.1.1 Reporter: Eric Lin See the following test case: 1) Create a table with RCFile format: {code} DROP TABLE IF EXISTS test; CREATE TABLE test (a int) PARTITIONED BY (p int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS RCFILE; {code} 2) run "DESC FORMATTED test" {code} # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe InputFormat:org.apache.hadoop.hive.ql.io.RCFileInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} shows that SerDe used is "ColumnarSerDe" 3) run "SHOW CREATE TABLE" and get the output: {code} CREATE TABLE `test`( `a` int) PARTITIONED BY ( `p` int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' LOCATION 'hdfs://node5.lab.cloudera.com:8020/user/hive/warehouse/case_78732.db/test' TBLPROPERTIES ( 'transient_lastDdlTime'='1448343875') {code} Note that there is no mention of "ColumnarSerDe" 4) Drop the table and then create the table again using the output from 3) 5) Check the output of "DESC FORMATTED test" {code} # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.hive.ql.io.RCFileInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.RCFileOutputFormat {code} The SerDe falls back to "LazySimpleSerDe", which is not correct. Any further query tries to INSERT or SELECT this table will fail with errors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12788) Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
Eric Lin created HIVE-12788: --- Summary: Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions Key: HIVE-12788 URL: https://issues.apache.org/jira/browse/HIVE-12788 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.1.1 Reporter: Eric Lin See the test case below: {code} 0: jdbc:hive2://localhost:1/default> create table test (a int); 0: jdbc:hive2://localhost:1/default> set hive.optimize.union.remove=true; No rows affected (0.01 seconds) 0: jdbc:hive2://localhost:1/default> set hive.mapred.supports.subdirectories=true; No rows affected (0.007 seconds) 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL SELECT COUNT(1) FROM test; +--+--+ | _u1._c0 | +--+--+ +--+--+ {code} Run the same query without setting hive.mapred.supports.subdirectories and hive.optimize.union.remove to true will give correct result: {code} 0: jdbc:hive2://localhost:1/default> SELECT COUNT(1) FROM test UNION ALL SELECT COUNT(1) FROM test; +--+--+ | _u1._c0 | +--+--+ | 1| | 1| +--+--+ {code} UNION ALL without COUNT function will work as expected: {code} 0: jdbc:hive2://localhost:1/default> select * from test UNION ALL SELECT * FROM test; ++--+ | _u1.a | ++--+ | 1 | | 1 | ++--+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13160) HS2 unable to load UDFs on startup when HMS is not ready
Eric Lin created HIVE-13160: --- Summary: HS2 unable to load UDFs on startup when HMS is not ready Key: HIVE-13160 URL: https://issues.apache.org/jira/browse/HIVE-13160 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.2.1 Reporter: Eric Lin Assignee: Vaibhav Gumashta The error looks like this: {code} 2016-02-24 21:16:09,901 INFO hive.metastore: [main]: Trying to connect to metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083 2016-02-24 21:16:09,971 WARN hive.metastore: [main]: Failed to connect to the MetaStore Server... 2016-02-24 21:16:09,971 INFO hive.metastore: [main]: Waiting 1 seconds before next connection attempt. 2016-02-24 21:16:10,971 INFO hive.metastore: [main]: Trying to connect to metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083 2016-02-24 21:16:10,975 WARN hive.metastore: [main]: Failed to connect to the MetaStore Server... 2016-02-24 21:16:10,976 INFO hive.metastore: [main]: Waiting 1 seconds before next connection attempt. 2016-02-24 21:16:11,976 INFO hive.metastore: [main]: Trying to connect to metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083 2016-02-24 21:16:11,979 WARN hive.metastore: [main]: Failed to connect to the MetaStore Server... 2016-02-24 21:16:11,979 INFO hive.metastore: [main]: Waiting 1 seconds before next connection attempt. 2016-02-24 21:16:12,987 WARN hive.ql.metadata.Hive: [main]: Failed to register all functions. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient . 2016-02-24 21:16:12,995 INFO hive.metastore: [main]: Trying to connect to metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083 2016-02-24 21:16:13,004 WARN hive.metastore: [main]: Failed to connect to the MetaStore Server... 2016-02-24 21:16:13,004 INFO hive.metastore: [main]: Waiting 1 seconds before next connection attempt. 2016-02-24 21:16:14,004 INFO hive.metastore: [main]: Trying to connect to metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083 2016-02-24 21:16:14,007 WARN hive.metastore: [main]: Failed to connect to the MetaStore Server... 2016-02-24 21:16:14,007 INFO hive.metastore: [main]: Waiting 1 seconds before next connection attempt. 2016-02-24 21:16:15,007 INFO hive.metastore: [main]: Trying to connect to metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083 2016-02-24 21:16:15,010 WARN hive.metastore: [main]: Failed to connect to the MetaStore Server... 2016-02-24 21:16:15,010 INFO hive.metastore: [main]: Waiting 1 seconds before next connection attempt. 2016-02-24 21:16:16,012 INFO org.apache.hive.service.server.HiveServer2: [main]: Shutting down HiveServer2 2016-02-24 21:16:16,014 INFO org.apache.hive.service.server.HiveServer2: [main]: Exception caught when calling stop of HiveServer2 before retrying start java.lang.NullPointerException at org.apache.hive.service.server.HiveServer2.stop(HiveServer2.java:283) at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:351) at org.apache.hive.service.server.HiveServer2.access$400(HiveServer2.java:69) at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:545) at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:418) 2016-02-24 21:16:16,014 WARN org.apache.hive.service.server.HiveServer2: [main]: Error starting HiveServer2 on attempt 1, will retry in 60 seconds .. 2016-02-24 21:17:16,016 INFO org.apache.hive.service.server.HiveServer2: [main]: Starting HiveServer2 2016-02-24 21:17:16,131 WARN org.apache.hadoop.hive.conf.HiveConf: [main]: HiveConf of name hive.sentry.conf.url does not exist 2016-02-24 21:17:16,132 WARN org.apache.hadoop.hive.conf.HiveConf: [main]: HiveConf of name hive.entity.capture.input.URI does not exist 2016-02-24 21:17:16,150 INFO org.apache.hadoop.security.UserGroupInformation: [main]: Login successful for user hive/host-10-17-81-201.coe.cloudera@yshi.com using keytab file hive.keytab 2016-02-24 21:17:16,150 INFO org.apache.hive.service.cli.CLIService: [main]: SPNego httpUGI not created, spNegoPrincipal: , ketabFile: 2016-02-24 21:17:16,154 INFO hive.metastore: [main]: Trying to connect to metastore with URI thrift://host-10-17-81-201.coe.cloudera.com:9083 2016-02-24 21:17:16,217 INFO hive.metastore: [main]: Opened a connection to metastore, current connections: 1 2016-02-24 21:17:16,218 INFO hive.metastore: [main]: Connected to metastore. {code} And then none of the functions will be available for use as HS2 does not re-register them after HMS is up and ready. This is not desired behaviour, we shouldn't allow HS2 to be in a servicing state if function list is not ready. Or, maybe instead of initi
[jira] [Created] (HIVE-13372) Hive Macro overwritten when multiple macros are used in one column
Lin Liu created HIVE-13372: -- Summary: Hive Macro overwritten when multiple macros are used in one column Key: HIVE-13372 URL: https://issues.apache.org/jira/browse/HIVE-13372 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Lin Liu Priority: Critical When multiple macros are used in one column, results of the later ones are over written by that of the first. For example: Suppose we have created a table called macro_test with single column x in STRING type, and with data as: "a" "bb" "ccc" We also create three macros: CREATE TEMPORARY MACRO STRING_LEN(x string) length(x); CREATE TEMPORARY MACRO STRING_LEN_PLUS_ONE(x string) length(x)+1; CREATE TEMPORARY MACRO STRING_LEN_PLUS_TWO(x string) length(x)+2; When we ran the following query, SELECT CONCAT(STRING_LEN(x), ":", STRING_LEN_PLUS_ONE(x), ":", STRING_LEN_PLUS_TWO(x)) a FROM macro_test SORT BY a DESC; We get result: 3:3:3 2:2:2 1:1:1 instead of expected: 3:4:5 2:3:4 1:2:3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13382) Dynamic partitioning for INSERT OVERWRITE Query is slow
Lin Liu created HIVE-13382: -- Summary: Dynamic partitioning for INSERT OVERWRITE Query is slow Key: HIVE-13382 URL: https://issues.apache.org/jira/browse/HIVE-13382 Project: Hive Issue Type: Bug Components: Hive Reporter: Lin Liu Priority: Blocker In our case we execute a query to update the data for a multi-level partitioned table. The MR jobs are finished pretty quickly, but the move stage took a very long time. Currently we are using Hive 1.2.1 and HIVE-11940 has been applied. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13780) Allow user to update AVRO table schema via command even if table's definition was defined through schema file
Eric Lin created HIVE-13780: --- Summary: Allow user to update AVRO table schema via command even if table's definition was defined through schema file Key: HIVE-13780 URL: https://issues.apache.org/jira/browse/HIVE-13780 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 2.0.0 Reporter: Eric Lin Priority: Minor If a table is defined as below: {code} CREATE TABLE test STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='/tmp/schema.json'); {code} if user tries to run command: {code} ALTER TABLE test CHANGE COLUMN col1 col1 STRING COMMENT 'test comment'; {code} The query will return without any warning, but has no affect to the table. It would be good if we can allow user to ALTER table (add/change column, update comment etc) even though the schema is defined through schema file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14482) Drop table partition is not audit logged in HMS
Eric Lin created HIVE-14482: --- Summary: Drop table partition is not audit logged in HMS Key: HIVE-14482 URL: https://issues.apache.org/jira/browse/HIVE-14482 Project: Hive Issue Type: Improvement Affects Versions: 2.1.0 Reporter: Eric Lin Assignee: Eric Lin Priority: Minor When running: {code} ALTER TABLE test DROP PARTITION (b=140); {code} I only see the following in the HMS log: {code} 2016-08-08 23:12:34,081 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: 2016-08-08 23:12:34,082 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=case_104408 tbl=test 2016-08-08 23:12:34,082 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=case_104408 tbl=test 2016-08-08 23:12:34,094 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: 2016-08-08 23:12:34,095 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: 2016-08-08 23:12:34,095 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_partitions_by_expr : db=case_104408 tbl=test 2016-08-08 23:12:34,096 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_partitions_by_expr : db=case_104408 tbl=test 2016-08-08 23:12:34,112 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: 2016-08-08 23:12:34,172 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: 2016-08-08 23:12:34,173 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=case_104408 tbl=test 2016-08-08 23:12:34,173 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=case_104408 tbl=test 2016-08-08 23:12:34,186 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: 2016-08-08 23:12:34,186 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: 2016-08-08 23:12:34,187 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=case_104408 tbl=test 2016-08-08 23:12:34,187 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=case_104408 tbl=test 2016-08-08 23:12:34,199 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: 2016-08-08 23:12:34,203 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: 2016-08-08 23:12:34,215 INFO org.apache.hadoop.hive.metastore.ObjectStore: [pool-4-thread-2]: JDO filter pushdown cannot be used: Filtering is supported only on partition keys of type string 2016-08-08 23:12:34,226 ERROR org.apache.hadoop.hdfs.KeyProviderCache: [pool-4-thread-2]: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 2016-08-08 23:12:34,239 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-4-thread-2]: dropPartition() will move partition-directories to trash-directory. 2016-08-08 23:12:34,239 INFO hive.metastore.hivemetastoressimpl: [pool-4-thread-2]: deleting hdfs://:8020/user/hive/warehouse/default/test/b=140 2016-08-08 23:12:34,247 INFO org.apache.hadoop.fs.TrashPolicyDefault: [pool-4-thread-2]: Moved: 'hdfs://:8020/user/hive/warehouse/default/test/b=140' to trash at: hdfs://:8020/user/hive/.Trash/Current/user/hive/warehouse/default/test/b=140 2016-08-08 23:12:34,247 INFO hive.metastore.hivemetastoressimpl: [pool-4-thread-2]: Moved to trash: hdfs://:8020/user/hive/warehouse/default/test/b=140 2016-08-08 23:12:34,247 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-2]: {code} There is no entry in the "HiveMetaStore.audit" to show that partition b=140 was dropped. When we add a new partition, we can see the following: {code} 2016-08-08 23:04:48,534 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx append_partition : db=default tbl=test[130] {code} Ideally we should see the similar message when dropping partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14537) Please add HMS audit logs for ADD and CHANGE COLUMN operations
Eric Lin created HIVE-14537: --- Summary: Please add HMS audit logs for ADD and CHANGE COLUMN operations Key: HIVE-14537 URL: https://issues.apache.org/jira/browse/HIVE-14537 Project: Hive Issue Type: Improvement Reporter: Eric Lin Priority: Minor Currently if you ALTER TABLE test ADD COLUMNS (c int), the only audit log we can see is: {code} 2016-08-09T13:29:56,411 INFO [pool-6-thread-2]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(754)) - 2: source:127.0.0.1 alter_table: db=default tbl=test newtbl=test {code} This is not enough to tell which columns are added or changed. It would be useful to add such information. Ideally we could see: {code} 2016-08-09T13:29:56,411 INFO [pool-6-thread-2]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(754)) - 2: source:127.0.0.1 alter_table: db=default tbl=test newtbl=test newCol=c[int] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14903) from_utc_time function issue for CET daylight savings
Eric Lin created HIVE-14903: --- Summary: from_utc_time function issue for CET daylight savings Key: HIVE-14903 URL: https://issues.apache.org/jira/browse/HIVE-14903 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 2.0.1 Reporter: Eric Lin Priority: Minor Based on https://en.wikipedia.org/wiki/Central_European_Summer_Time, the summer time is between 1:00 UTC on the last Sunday of March and 1:00 on the last Sunday of October, see test case below: Impala: {code} [host-10-17-101-195.coe.cloudera.com:25003] > select from_utc_timestamp('2016-10-30 00:30:00','CET'); Query: select from_utc_timestamp('2016-10-30 00:30:00','CET') +--+ | from_utc_timestamp('2016-10-30 00:30:00', 'cet') | +--+ | 2016-10-30 01:30:00 | +--+ {code} Hive: {code} 0: jdbc:hive2://host-10-17-101-195.coe.cloude> select from_utc_timestamp('2016-10-30 00:30:00','CET'); INFO : OK ++--+ | _c0 | ++--+ | 2016-10-30 01:30:00.0 | ++--+ {code} MySQL: {code} mysql> SELECT CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ); +---+ | CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ) | +---+ | 2016-10-30 02:30:00 | +---+ {code} At 00:30AM UTC, the daylight saving has not finished so the time different should still be 2 hours rather than 1. MySQL returned correct result At 1:30, results are correct: Impala: {code} Query: select from_utc_timestamp('2016-10-30 01:30:00','CET') +--+ | from_utc_timestamp('2016-10-30 01:30:00', 'cet') | +--+ | 2016-10-30 02:30:00 | +--+ Fetched 1 row(s) in 0.01s {code} Hive: {code} ++--+ | _c0 | ++--+ | 2016-10-30 02:30:00.0 | ++--+ 1 row selected (0.252 seconds) {code} MySQL: {code} mysql> SELECT CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ); +---+ | CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ) | +---+ | 2016-10-30 02:30:00 | +---+ 1 row in set (0.00 sec) {code} Seems like a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15166) Provide beeline option to set the jline history max size
Eric Lin created HIVE-15166: --- Summary: Provide beeline option to set the jline history max size Key: HIVE-15166 URL: https://issues.apache.org/jira/browse/HIVE-15166 Project: Hive Issue Type: Improvement Components: Beeline Affects Versions: 2.1.0 Reporter: Eric Lin Currently Beeline does not provide an option to limit the max size for beeline history file, in the case that each query is very big, it will flood the history file and slow down beeline on start up and shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)